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Abstract 

We present a performance analysis for image registration with gradient descent methods. 
We consider a typical multiscale registration setting where the global 2-D translation between a 
pair of images is estimated by smoothing the images and minimizing the distance between them 
with gradient descent. Our study particularly concentrates on the effect of noise and low-pass 
filtering on the alignment accuracy. We adopt an analytic representation for images and analyze 
the well-behavedness of the image distance function by estimating the neighborhood of trans- 
lations for which it is free of undesired local minima. This corresponds to the neighborhood of 
translation vectors that are correctly computable with a simple gradient descent minimization. 
We show that the area of this neighborhood increases at least quadratically with the smoothing 
filter size, which justifies the use of a smoothing step in image registration with local optimizers 
such as gradient descent. We then examine the effect of noise on the alignment accuracy and 
derive an upper bound for the alignment error in terms of the noise properties and filter size. 
Our main finding is that the error increases at a rate that is at least linear with respect to 
the filter size. Therefore, smoothing improves the well-behavedness of the distance function; 
however, this comes at the cost of amplifying the alignment error in noisy settings. Our results 
provide a mathematical insight about why hierarchical techniques are effective in image regis- 
tration, suggesting that the multiscale coarse-to-fine alignment strategy of these techniques is 
very suitable from the perspective of the trade-off between the well-behavedness of the objective 
function and the registration accuracy. To the best of our knowledge, this is the first such study 
for descent-based image registration. 

Keywords. Image registration, hierarchical registration methods, image smoothing, gradient 
descent, performance analysis. 

1 Introduction 

The estimation of the relative motion between two images is one of the important problems of 
image processing. The necessity for registering images arises in many different applications like 
image analysis and classification jl], [2], [3]; biomedical imaging 4J, stereo vision 5 , motion es- 
timation for video coding ^ . The alignment of an image pair typically requires the optimization 
of a dissimilarity (or similarity) measure, whose common examples are sum-of-squared differ- 
ence (SSD), approximations of SSD, and cross-correlation [7], |S]. Many registration techniques 
adopt, or can be coupled with, a multiscale hierarchical search strategy. In hierarchical regis- 
tration, reference and target images are aligned by applying a coarse-to-fine estimation of the 
transformation parameters, using a pyramid of low-pass filtered and downsampled versions of 
the images. Coarse scales of the pyramid are used for a rough estimation of the transformation 
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parameters. These scales have the advantage that the solution is less likely to get trapped into 
the local minima of the dissimilarity function as the images are smoothed by low-pass filtering. 
Moreover, the search complexity is lower at coarse scales as the image pair is downsampled 
accordingly. The alignment is then refined gradually by moving on to the finer scales. Since it 
offers a good compromise between complexity and accuracy, the coarse-to-fine alignment strat- 
egy has been widely used in many registration and motion estimation applications [3], [5], |10j . 

In this work, we present a theoretical study that analyzes the effect of smoothing on the per- 
formance of image registration. One of our main goals is to understand better the mathematical 
principles behind multiscale registration techniques. Most theoretical results in the image reg- 
istration literature (e.g., [T3], [II], [E]) investigate how image noise affects the registration 
accuracy. However, the analysis of the effect of smoothing on the registration performance has 
generally been given little attention in the literature. Although it is widely known as a practi- 
cal fact that smoothing an image pair is helpful for overcoming the undesired local minima of 
the dissimilarity function jl6j . [6], to the best of our knowledge, this has not been extensively 
studied on a mathematical basis yet. Some of the existing works examine how smoothing in- 
fluences the bias on the registration with gradient-based methods jl31 , and the bias and model 
conditioning in optical flow |17| . [18] . whose scopes are however limited to methods employing 
a linear approximation of the image intensity function. Hence, the understanding of the exact 
relation between smoothing and the well-behavedness of the image dissimilarity function con- 
stitutes the first objective of this study. Our second objective is to characterize the effect of 
noise on the performance of multi-scale image registration, i.e., to derive the noise performance 
in the registration of noisy images as a function of the smoothing parameter. 

We consider a setting where the geometric transformation between the reference and target 
images is a global 2-D translation. Although the registration problem is formulated for an 
image pair in this work, one can equivalently assume that the considered reference and target 
patterns are image patches rather than complete images. For this reason, our study is of 
interest not only for registration applications where the transformation between the image pair is 
modeled by a pure translation (e.g., as in satellite images), but also for various motion estimation 
techniques, such as block-matching algorithms and region-based matching techniques in optical 
flow that assign constant displacement vectors to image subregions. We adopt an analytic and 
parametric model for the reference and target patterns and formulate the registration problem 
in the continuous domain of square-integrable functions L^(IR^). We use the squared-distance 
between the image intensity functions as the dissimilarity measure. This distance function is 
the continuous domain equivalent of SSD. We study two different aspects of image registration 
in this work; namely, alignment regularity and alignment accuracy. 

We first look at alignment regularity] i.e., the well-behavedness of the distance function, and 
estimate the largest neighborhood of translations such that the distance function has only one 
local minimum, which is also the global minimum. Then we study the influence of smoothing 
the reference and target patterns on the neighborhood of translations recoverable with local 
minimizers such as descent-type algorithms without getting trapped in a local minimum. In 
more details, we consider the set of patterns that are generated by the translations of a ref- 
erence pattern, which forms the translation manifold of that pattern. In the examination of 
the alignment regularity, we assume that the target pattern lies on the translation manifold of 
the reference pattern. We then consider the distance function f{U) between the reference and 
target patterns, where U is the translation vector. The global minimum of / is at the origin 
U = 0. Then, in the translation parameter domain, we consider the largest open neighborhood 
around the origin within which / is an increasing function along any ray starting out from the 
origin. We call this neighborhood the Single Distance Extremum Neighborhood (SID EN). The 
SIDEN of a reference pattern is important in the sense that it defines the translations that 
can be correctly recovered by minimizing / with a descent method. We derive an analytic 
estimation of the SIDEN. Then, in order to study the effect of smoothing on the alignment 
regularity, we consider the registration of low-pass filtered versions of the reference and target 
patterns and examine how the SIDEN varies with the filter size. Our main result is that the 
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volume (area) of the SIDEN increases at a rate of at least 0(1 + p^) with respect to the size p 
of the low-pass filter kernel, which controls the level of smoothing. This formally shows that, 
when the patterns are low-pass filtered, a wider range of translation values can be recovered 
with descent-type methods; hence, smoothing improves the regularity of alignment. Then, we 
demonstrate the usage of our SIDEN estimate for constructing a regular multiresolution grid in 
the translation parameter domain with exact alignment guarantees. Based on our estimation 
of the neighborhood of translations that are recoverable with descent methods, we design an 
adaptive search grid in the translation parameter domain such that large translations can be 
recovered by locating the closest solution on the grid and then refining this estimation with a 
descent method. 

Then we look at alignment accuracy and study the effect of image noise on the accuracy 
of image alignment. We also characterize the influence of low-pass filtering on the alignment 
accuracy in a noisy setting. This is an important question, as the target image is rarely an exactly 
translated version of the reference image in practical applications. When the target pattern is 
noisy, it is not exactly on the translation manifold of the reference pattern. The noise on the 
target pattern causes the global minimum of the distance function to deviate from the solution 
U = 0. We formulate the alignment error as the perturbation in the global minimum of the 
distance function, which corresponds to the misalignment between the image pair due to noise. 
We focus on two different noise models. In the first setting, we look at Gaussian noise. In the 
second setting, we examine arbitrary square-integrable noise patterns, where we consider general 
noise patterns and noise patterns that have small correlation with the points on the translation 
manifold of the reference pattern. We derive upper bounds on the alignment error in terms of the 
noise level and the pattern parameters in both settings. We then consider the smoothing of the 
reference and target patterns in these settings and look at the variation of the alignment error 
with the noise level and the filter size. It turns out that the alignment error bound increases at a 
rate of O (77^^^ (1 — J?)"^^^) and O (y^/"^ (1 — zy)~^/^) in respectively the first and second settings 
with respect to the noise level, where 77 is the standard deviation of the Gaussian noise, and v 
is the norm of the noise pattern. Another observation is that the alignment error is small if the 
noise pattern has small correlation with translated versions of the reference pattern. Moreover, 
the alignment error bounds increase at the rates O [p"^^^ (1 — p)~^^'^) and O ((1 -|- p^)^^^) in the 
first and second settings, with respect to the filter size p. Therefore, our main finding is that 
smoothing the image pair tends to increase the alignment error when the target pattern does 
not lie on the translation manifold of the reference pattern. The experimental results confirm 
that the behavior of the theoretical bound as a function of the noise level and filter size reflects 
well the behavior of the actual error. 

The results of our analysis show that smoothing has the desirable effect of improving the 
well-behavedness of the distance function; however, it also leads to the amplification of the 
alignment error caused by the image noise. This suggests that, in the development of multiscale 
image registration methods, one needs to take the noise level into account as a design parameter. 

The rest of the text is organized as follows. In Section [2] we give an overview of related 
work. In Section |3j we focus on the alignment regularity problem, where we first derive an 
estimation of the SIDEN and then examine its variation with filtering. Then in Section |4j we 
look into the alignment accuracy problem and present our results regarding the influence of 
noise on the alignment accuracy. In Section [5] we present experimental results. In Section [6j 
we give a discussion of our results and interpret them in comparison with the previous studies 
in the literature. Finally, we conclude in Section [7] 

2 Related Work 

The problem of estimating the displacement between two images has been studied extensively 
in the image registration |19j and motion estimation ^ literatures. Here we limit our discussion 
mostly to region-based methods. We first give a brief overview of some hierarchical multiscale 
registration and motion estimation methods and then mention some theoretical results about 
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image alignment. 

The coarse-to-fine alignment strategy has been used in various types of image registration ap- 
plications; e.g., registration for stereo vision [5], alignment with multiresolution tangent-distance 
for image analysis [3], biomedical image registration [9j. The hierarchical search strategy has 
proved useful in motion estimation, since it accelerates the algorithm and leads to better so- 
lutions with reduced sensitivity to local minima [HI, |10j . It is also used very commonly 
in gradient-based optical flow techniques such as those in [5], |12j . which apply a first-order 
approximation of the variations in the image intensity function. The hierarchical search that 
filters and downsamples the images permits the design of gradient-based methods that remain 
in the domain of linearity. 

Region-based registration and motion estimation methods use different dissimilarity mea- 
sures and optimization techniques. Many methods use SSD (sum-of-squared difference) as the 
dissimilarity measure [S]. SSD corresponds to the squared- norm of what is usually called the 
displaced frame difference (DFD) in motion estimation. The direct correlation is also widely 
used as a similarity measure, and it can be shown to be equivalent to SSD [13 . 

In this work, we consider (the continuous domain equivalent of) SSD as the dissimilarity 
measure. We will essentially consider gradient descent as the minimization technique in our 
analysis; however, our main motivation is to understand to what extent local minimizers are 
efficient in image registration. Hence, the implications of our study concern a wide range of 
registration and motion estimation techniques that minimize SSD (or its approximations) with 
local minimizers, e.g., [2D], [5T], ^2], |TD], [J, and fast block-matching methods relying on 
convexity assumptions 

We now overview some theoretical studies about the performance of registration algorithms. 
The work by Robinson et al. |13j studies the estimation of global translation parameters between 
an image pair corrupted with additive Gaussian noise. The authors first derive the Cramer- 
Rao lower bound (CRLB) on the translation estimation. Given by the inverse of the Fisher 
information matrix, the CRLB is a general lower bound for the MSE of an estimator that 
computes a set of parameters from noisy observations. The authors then examine the bias 
on multiscale gradient-based methods. A detailed discussion of the results in |T3] is given in 
Section |6] along with a comparison to our results. Another work that examines Cramer-Rao 
lower bounds in registration is given in |14j . where the bounds are derived for several models 
with transformations estimated from a set of matched feature points with noisy positions. 

The studies in [T7], [TS] have also examined the bias of gradient-based shift estimators and 
shown that presmoothing the images reduces the bias on the estimator. However, smoothing 
also has the undesired effect of impairing the conditioning of the linear system to be solved 
in gradient-based estimators [17]. Therefore, this tradeoff must be taken into account in the 
selection of the filter size in coarse-to-fine gradient-based registration. The papers [13] , [231 
thermore show that the bias on gradient-based estimators increases as the amount of translation 
increases. Robinson et al. |13j use this observation to explain the benefits of multiscale gradient- 
based methods. At large scales, downsampling, which reduces the amount of translation, and 
smoothing help to decrease the bias on the estimator. Then, as the change in the translation 
parameters is small at fine scales, the estimation does not suffer from this type of bias anymore. 
Moreover, at fine scales, the accuracy of the estimation increases as high-frequency components 
are no more suppressed. This is due to the fact that the CRLB of the estimation is smaller 
when the bandwidth of the image is larger. 

The article [21] is a recent theoretical study on the accuracy of subpixel block-matching in 
stereo vision, which has relations to our work. The paper first examines the relation between 
the discrete and continuous block-matching distances, and then presents a continuous-domain 
analysis of the effect of noise on the accuracy of disparity estimation from a rectified stereo pair 
corrupted with additive Gaussian noise. An estimation of the disparity that globally minimizes 
the windowed squared-distance between blocks is derived. A comparison of the results presented 
in [21] and in our work is given in Section |6] 

Such studies focus rather on the alignment accuracy problem and do not look at the alignment 
regularity aspect of image registration. Moreover, the works that have studied the alignment 
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accuracy of multiscale registration by examining the effect of smoothing are hmited to gradient- 
based methods, i.e., methods that employ a linear approximation of image intensities. In this 
work, we address both of these issues and derive bounds on both alignment regularity and 
alignment accuracy in multiscale registration. 



3 Analysis of Alignment Regularity 
3.1 Notation and Problem Formulation 

Let p e L^(IR^) be a visual pattern. In order to study the image registration problem analytically, 
we adopt a representation of p in an analytic and parametric dictionary manifold 



(1) 



Here, each atom 0^ of the dictionary 2? is derived from an analytic mother function by a 
geometric transformation specified by the parameter vector 7, where -0 is a rotation parameter, 
Tx and Ty denote translations in x and y directions, and ax and ay represent an anisotropic scaling 
in X and y directions. T is the transformation parameter domain over which the dictionary is 
defined. By defining the spatial coordinate variable X = [x y]'^ E R^^^, we refer to the mother 
function as (j){X). Then an atom is given by 
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Therefore, the atom is the function cf) scaled by cr, rotated by and translated by t. 

It is shown in [2S] (in the proof of Proposition 2.1.2) that the linear span of a dictionary 
V generated with respect to the transformation model in ^ is dense in L^(IR^) if the mother 
function has nontrivial support (unless ^(X) = almost everywhere). In our analysis, we 
choose 6 to be the Gaussian function 



as it has good time-localization and it is easy to treat in derivations due to its well-studied 
properties. This choice also ensures that Span{T>) is dense in L^(IR^); therefore, any pattern 
p £ L^(IR^) can be approximated in T> with arbitrary accuracy. In this work, we assume that a 
sufhciently accurate approximation oi p with finitely many atoms in T) is available; i.e., 



K 



p{X)^Y.^kMiX) 



(4) 



where K is the number of atoms used in the representation of p, 7/j are the atom parameters 
and \k are the atom coefficients. 

Throughout the discussion, T = [T^, Ty]^ £ denotes a unit-norm vector and is the unit 
circle in R^. For referring to translation vectors, we use the notation tT where t > denotes 
the magnitude of the vector (amount of translation) and T defines the direction of translation. 
Then, the translation manifold Ai{p) oi p is the set of patterns generated by the translations of 
P 

M{p) ^ {p{X - tT) ■.TeS\t£ [0,-1-00)} c l2(r2). (5) 

We consider the squared-distance between the reference pattern p{X) and its translated 
version p(X — tT). This distance is the continuous domain equivalent of the SSD measure that 
is widely used in registration methods. The squared-distance in the continuous domain is given 

by 
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fitT) = \\p{X) - p{X -tT)f= {p{X) - p{X - tT)fdX (6) 

where the notatioi|^||.|| stands for the L^-norni for vectors in L^(IR^) and the £^-norm for vectors 
in 1R2. 

The global minimum of / is at the origin tT — 0. Therefore, there exists a region around 
the origin within which the restriction of / to a ray tTa starting out from the origin along an 
arbitrary direction Ta is an increasing function of i > for all Ta- This allows us to define the 
Single Distance Extremum Neighborhood (SIDEN) as follows. 

Definition 1. We call the set of translation vectors 

5 = {0} U {oJtT -.T e S^,ujj.> 0, and '^^^^ > for all < t < luj,} (7) 

the Single Distance Extremum Neighborhood (SIDEN) of the pattern p. 

Note that the origin {0} is included separately in the definition of SIDEN since the gradient 
of / vanishes at the origin and therefore df (tT) / dt\t=o = for all T. The SIDEN 5 C is an 
open neighborhood of the origin such that the only stationary point of / inside S is the origin. 
We formulate this in the following proposition. 

Proposition 1. Let tT e S. Then Vf{tT) ^ if and only if tT = 0. 

Proof Let Vf{tT) = for some tT e S. Then, VrfitT) = 0, which is the directional derivative 
of / along the direction T at tT. This gives 



^TfitT) = ^f[tT + uT) 



U—{) 



U—{) 



dfjtT) ^ ^ 
dt 



which implies that t = 0, as tT e S. The second part V/(0) = of the statement also holds 
clearly, since the global minimum of / is at 0. □ 

Proposition [l] can be interpreted as follows. The only local minimum of the distance function 
/ is at the origin in S. Therefore, when a translated version p{X — tT) of the reference pattern 
is aligned with p{X) with a local optimization method like a gradient descent algorithm, the 
local minimum achieved in S is necessarily also the global minimum. 

The goal of our analysis is now the following. Given a reference pattern p, we would like to 
find an analytical estimation of S. However, the exact derivation of S requires the calculation 
of the exact zero-crossings of df{tT)/dt, which is not easy to do analytically. Instead, one can 
characterize the SIDEN by computing a neighborhood Q of that lies completely in S; i.e., 
Q d S. Q can be derived by using a polynomial approximation of / and calculating, for all unit 
directions T, a lower bound St for the supremum of ljj. such that uJtT is in S. This does not 
only provide an analytic estimation of the SIDEN, but also defines a set that is known to be 
completely inside the SIDEN. The regions S and Q are illustrated in Figure [l] 

In Section [3. 2| we derive Q. In particular, Q is obtained in the form of a compact analytic set 
and / is a differentiable function. This guarantees that, if the translation that aligns the image 
pair perfectly is in the set Q, the distance function / can be minimized with gradient descent 
algorithms; the solution converges to a local minimum of / in Q, which is necessarily the global 
minimum of /, resulting in a perfect alignment. Moreover, we will see in Section [s] that the 
knowledge of a set Q C S permits us to design a registration algorithm that can recover large 
translations perfectly. 

Finally, as Q is obtained analytically and parametrically, it is simple to examine its variation 
with the low-pass filtering applied to p. This is helpful for gaining an understanding of the 



relation between the alignment regularity and smoothing. We study this relation in Section 3.3 



^ Since it is clear from the context which one of these norms is meant, we denote these two norms in the same way 
for simplicity of notation. 
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Figure 1: Illustration of the SIDEN S and its estimation Q. S is the largest open neighborhood 
around the origin within which the distance function / is an increasing function along all rays 
starting out from the origin. Therefore, along each unit direction T, S covers points cj^T such that 
f{tT) is increasing between and cj^T. The estimation Q of the SIDEN is obtained by computing 
a lower bound 5t for the first zero-crossing of df{tT)/dt. 

3.2 Estimation of SIDEN 

We now derive an estimation Q for the Single Distance Extremum Neighborhood S. In the 
following, we consider T to be a fixed unit direction in 5*^. We derive Q C 5 by computing a 
5t which guarantees that df{tT)/dt > for all < t < 6t- In the derivation of Q, we need 
a closed-form expression for df{tT)/dt. Since / is the distance between the patterns p{X) and 
p{X — tT) that are represented in terms of Gaussian atoms (see Eq. |4| , it involves the integration 
of the multiplication of pairs of Gaussian atoms. We will use the following proposition about 
the integration of the product of Gaussian atoms |26) . 

Proposition 2. Let <j)^^{X) = 0((Tj"^ [X - r,)) and = 0(0-^:^ "^f^^ {X - Tk)). Then 

4>^^ {X^,, {X)dX = ^^^J= (- \ (Tk - r,f (n - r,)) 

where 

The symbol S^^, defined in Proposition [2] is a function of the parameters of the j-th and fc-th 



7 



atoms. We also denote 



■= 2 ^jk T 



b,k I ST,i (rfc - r,) (8) 
1 



Cjk ■■= ^ {Tk - Tjf Y.-^ {Tk - Tj) 



and define 



Notice that ajk > and c^/c > since |lr|| = 1 and Yjki ^Jk ^re positive definite matrices. By 
definition, Qjk > as well. Note also that ajk and bjk are functions of the unit direction T; 
however, for the sake of simplicity we avoid expressing their dependence on T explicitly in our 
notation. 

We can now give our result about the estimation of the SIDEN. 
Theorem 1. The region Q C is a subset of the SIDEN S of the pattern p if 

Q = {tT:T eS\ 0<t<6T} (10) 
where St is the only positive root of the polynomial \a4\t^ — a^t^ — ai and 

K K 

3 = 1 fc=l 
K K 



/ 8 \ 

i = l k=l ^ ^ 

K K 

ai = -1.37y^ [AjAfcl Qjfc exp 



ajk 



''jk 



j=i fc=i 

are constants depending on T and on the parameters 7^, of the atoms in p. 

The proof of Theorem [T] is given in Appendix |A.1[ It applies a Taylor expansion of the 
derivative of the distance function f{tT) with respect to t along a fixed direction T, and derives 
5t such that df{tT) / dt is positive for all t < St- Therefore, along each direction T, St constitutes 
a lower bound for the first zero-crossing of df{tT)/dt (see Figure [l] for an illustration of St)- 
By varying T over the unit circle, one obtains a closed neighborhood Q of that is a subset of 
iS. This region can be analytically computed using only the parametric representation of p and 
provides an estimate for the range of translations tT over which p{X) can be exactly aligned 
with p{X - tT). 



3.3 Variation of SIDEN with Smoothing 

We now examine how smoothing the reference pattern p with a low-pass filter infiuences its 
SIDEN. We assume a Gaussian kernel for the filter. The Gaussian function is a commonly used 
kernel for low-pass filtering and its distinctive properties has made it popular in scale-space 
theory research [37] (see Section [6] for a more detailed discussion) . We assume that p is filtered 
with a Gaussian kernel of the form 

:^MX) (11) 
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with unit L^-norm. The function (l)p{X) = 0(A ^{X)) is an isotropic Gaussian atom with scale 
matrix 



A = 



P 
p 



(12) 



The scale parameter p controls the size of the Gaussian kernel. We denote the smoothed version 
of the reference pattern p{X) by "PiX), which is given as 



1 ^ 1 

TTp ^ — ' np 



(13) 



A,-l 



by linearity of the convolution operator. In order to calculate p, we use the following proposition, 
which gives the expression for the Gaussian atom obtained from the convolution of two Gaussian 
atoms 

Proposition 3. Let ^{X) = (/'(crf^ {X - n)) and Mi^) = (t>{(T2^ "^2^ {X - T2)). Then 



M (X) * (X) ^^-p-r-M (X) 



(14) 



where 

and the parameters of <j)^^ {X) are given by 



(/.73(X) = 0(^3-1 vI,-i(X-r3)) 



T3 = Ti + T2, ^-3 ^3*3^ = *l CT? + *2 CT^ *2 ^■ 



From Proposition [3] we obtain 



7rp" ICTfe 
where ^ {X) = ^ (X - h)) and 



^-MX)*M{X)^^^cl^^,iX) (15) 



rk=rk, *fe = *fc, (7fc = y^A2 +0-2. (16) 

Hence, when p is smoothed with a Gaussian filter, the atom 0^^, (X) with coefhcient Afc in the 
original pattern p is replaced by the smoothed atom (j)^^ (X) with coefficient 

c _Wk\. _ Wk\ . _ crx,k<7y,k , , „s 

\ak\ VW^\ 7(p^ + <,)(p^+a^,,) 

where Uk — diag[ax,k, o'y,k)- This shows that the change in the pattern parameters due to 
the filtering can be captured by substituting the scale parameters fjfc with ct^ and replacing the 
coefficients Xk with A^,. Thus, the smoothed pattern p is sparsely representable in the dictionary 
V as 

K 

p{X)^Y.^\kM{X). (18) 

k=l 



Considering the same setting as in Section pTT] where the target pattern p{X — tT) is exactly 
a translated version of the reference pattern p{X), we now assume that both the reference and 
target patterns are low-pass filtered as it is typically done in hierarchical image registration 
algorithms. The above equations show that, when a pattern is low-pass filtered, the scale 
parameters of its atoms increase and the atom coefficients decrease proportionally to the size 
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of the filter kernel, leading to a spatial diffusion of the image intensity function of the pattern. 
The goal of this section is to show that this diffusion increases the volume of the SIDEN. We 
achieve this by analyzing the variation of the smoothed SIDEN estimate Q corresponding to 
the smoothed distance 



f{tT) 



{p{X)~p{X-tT)ydX 



(19) 



with respect to the filter size p. Since the smoothed pattern has the same parametric form 
( 18 1 as the original pattern, the variation of Q with p can be analyzed easily by examining the 
dependence of the p aram eters involved in the derivation of Q on p. In the following, we express 
the terms in Section 3.2 that have a dependence on p with the notation (.), such as ajk, bjk, Afc, 
(Tfc. We write the terms that do not depend on p in the same way as before; e.g., t, T, r^, 5"^. 

Now, we can apply the result of Theorem [T] for the smoothed pattern pi^)- For ^ given 
kernel size p, the smoothed versions Ujk, bjk, Cjk, Qjk of the parameters in (Isl) can be obtained 
by replacing the scale parameters Cfe with ak defined in (16). Then, the smoothed SIDEN 
corresponding to p is given as 



Q = {tr : T e 5\ < t < (5t} 
where St is the positive root of the polynomial |Q;4|t^ — a^t'^ — ai such that 



(20) 



K K 

j=l fc=l 

K K 



a4 



3 = 1 fe=l 



"3 = ^^h^^kQjk [~7^^k+^^ka]k 



K K 



-1.37^ ^ \\fXk\Qjk exp 



i=i k=i 




jk ■ 



(21) 
(22) 
(23) 



Similarly to the derivation in Section 



3.2 



the terms djfc, bjk, Cjk, Qjk are associated with the 
integration of the product of smoothed Gaussian atom pairs, and they appear in the closed-form 
expression of the derivative df{tT)/dt of the smoothed distance function. 

We are now ready to give the following result, which summarizes the dependence of the 
smoothed SIDEN estimate on the filter size p. 



Theorem 2. Let V{Q) denote the volume (area) of the SIDEN estimate Q for the smoothed 
pattern p. Then, the order of dependence of the volume of Q on p is given by V(Q) = 0{l + p^). 



We prove Theorem [2] in Appendix |A.2 The proof is based on the examination of the order 
of variation of djk, bjk, Cjk, Qjk with p, which is then used to derive the dependence of St on p. 

Theorem [2] is the main result of this section. It states that the volume of the SIDEN esti- 
mate increases with the size of the filter applied on the patterns to be aligned. The theorem 
shows that the volume of the region of translations for which the reference pattern p{X) can be 
perfectly ahgned with p{X — tT) using a descent method, expands at the rate 0(1 -I- p^) with 
respect to the increase in the filter size p. Here, the order of variation 0{\ + p^) is obtained 
for the estimate Q of the SIDEN. Hence, one may wonder if the volume V{S) of the SIDEN S 
has the same dependence on p. Remembering that Q C 5 for all p, one immediate observation 
is that the rate of expansion of S must be at least 0{\ + p^); otherwise, there would exist a 
sufficiently large value of p such that Q is not included in S. One can therefore conclude that 
V{S) > V{Q) = 0(1 -I- p^). However, this only gives a lower bound for the rate of expansion of 
S and the exact rate of expansion of S may be larger. In the following, we give a few comments 
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about the variation of S with p. 



Remark. As shown in the proof of Theoremjl] the derivative of the distance function f{tT) 
is of the form 

^^ = EE^^^'^-Q^^^^'=w (24) 

j=l k=l 



where 



Sjk 



(25) 



In order to derive S, one needs to exactly locate the smallest zero-crossing of the function 
'^^-^§p-- This is not easy to do analytically due to the complicated form of the functions Sjk{t), 
which we have handled with polynomial approximations in the derivation of Q. However, in 
order to gain an intuition about how the zero-crossings change with filtering, one can look 
at the dependence of the extrema of the two additive terms in Sjk{t) on p. The function 
g-(ajfc + 2 6jfc t) ^^^-j j^g^g extrema at 



Mo 





(26) 



and 



a,fct^-26,fci 



(fljfe t — bjk) has two extrema at 



M2 = 




m = 




(27) 



Now replacing the original parameters ajk, bjk with their smoothed versions cijk, bjk and using 
the result from the proof of Theoremjijthat cijk and bjk decrease at a rate of O ((1 -|- /o^)~^) , it is 
easy to show that the locations of the extrema po, pi, jl2, fj-s change with a rate of 0[{1 + p^Y^^) . 
One may thus conjecture that the zero-crossings of df{tT)/dt along a fixed direction T might 
also move at the same rate, which gives the volume of S as V{S) — 0{1 + p'^)- 

On the other hand, V{S) may also exhibit a different type of variation with p depending on 
the atom parameters of p. In particular, V{S) may expand at a rate greater than 0(1 -I- p^) for 
some patterns. For example, we have the following property for patterns that consist of atoms 
with the same sign. 

Proposition 4. // all atom coefficients \k of the reference pattern p{X) ~ ^k4'iki-^) 
have the same sign (Xk > for all k = 1, . . . , K ; or Xk < for all k = I, . . . , K ), then there 
exists a threshold value po of the filter size such that for all p > pQ, S — ¥? . 



The proof of the above proposition is given in Appendix |A.3| The proposition states that 
the translations of a reference pattern with only positive or only negative atom coefficients can 
be recovered with gradient descent methods regardless of the amount of translation, provided 
that the filter size p is sufficiently large. In this case, V{S) = oo for filter sizes larger than 
the threshold po- Notice that the condition that Afc's have the same sign is a sufficient but 
not necessary condition in order to have this property. Patterns whose atoms with positive (or 
negative) coefficients are dominant over the atoms with the opposite sign are likely to have this 
property due to their resemblance to patterns consisting of atoms with coefficients of the same 
sign. 

Theorem [2] describes the effect of smoothing images before alignment. One may then wonder 
what the optimal filter size to be applied to the patterns before alignment is, given a reference 
and a target pattern. Theorem [2] suggests that, if the target pattern is on the translation 
manifold of the reference pattern, applying a large filter is always preferable as it provides a 
large range of translations recoverable by descent algorithms. The accuracy of alignment does 
not change with the filter size in this noiseless setting, since a perfect alignment is always 
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guaranteed with descent methods as long as the amount of translation is inside the SIDEN. 
However, the assumption that the target pattern is exactly of the form p(X — tT) is not realistic 
in practice; i.e., in real image processing applications, the target image is likely to deviate from 
M{p) due to the noise caused by image capture conditions, imaging model characteristics, etc. 
Hence, we examine in Section [4] if filtering affects the accuracy of alignment when the target 
image deviates from A4{p). 



4 Analysis of Alignment Accuracy in Noisy Settings 

We now analyze the effect of noise and smoothing on the accuracy of the estimation of translation 
parameters. In general, noise causes a perturbation in the location of the global minimum of the 
distance function. The perturbed version of the single global minimum of the noiseless distance 
function / will remain in the form of a single global minimum for the noisy distance function 
with high probability if the noise level is sufficiently small. The noise similarly introduces a 
perturbation on the SIDEN as well. The exact derivation of the SIDEN in the noisy setting 
requires the examination of the first zero-crossings of the derivative of the noisy distance function 
along arbitrary directions T around its global minimum. At small noise levels, these zero- 
crossings are expected to be perturbed versions of the first zero-crossings of df{tT)/dT around 
the origin, which define the boundary of the noiseless SIDEN S. The perturbation on the zero- 
crossings depends on the noise level. If the noise level is sufficiently small, the perturbation 
on the zero-crossings will be smaller than the distance between S and its estimate Q. This is 
due to the fact that Q is a worst-case estimate for S and its boundary is sufficiently distant 
from the boundary of S in practice, which is also confirmed by the experiments in Section [5] In 
this case, the estimate Q obtained from the noiseless distance function / is also a subset of the 
noisy SIDEN. Therefore, under the small noise assumption, Q can be considered as an estimate 
of the noisy SIDEN as well and it can be used in the alignment of noisy images in practice]^ 
Our alignment analysis in this section relies on this assumption. Since we consider that the 
reference and target patterns are aligned with a descent-type optimization method, the solution 
will converge to the global minimum of the noisy distance function in the noisy setting. The 
alignment error is then given by the change in the global minimum of the distance function, 
which we analyze now. 

The selection of the noise model for the representation of the deviation of the target pattern 
from the translation manifold of the reference pattern depends on the imaging application. It is 
common practice to represent non-predictable deviations of the image intensity function from 
the image model with additive Gaussian noise. This noise model fits well the image intensity 
variations due to imperfections of the image capture system, sensor noise, etc. Meanwhile, 
in some settings, one may have a prior knowledge of the type of the deviation of the target 
image from the translation manifold of the reference image. For instance, the deviation from 
the translation manifold may be due to some geometric image deformations, non-planar scene 
structures, etc. In such settings, one may be able to bound the magnitude of the deviation of 
the image intensity function from the translation model. Considering these, we examine two 
different noise models in our analysis. We first focus on a setting where the target pattern is 
corrupted with respect to an analytic noise model in the continuous space _L^(IR^). The analytic 
noise model is inspired by the i.i.d. Gaussian noise in the discrete space R". In Section [4?H 
we derive a probabilistic upper bound on the alignment error for this setting in terms of the 



parameters of the reference pattern and the noise model. Then, in Section 4.2 we generalize the 
results of Section 4.1 to arbitrary noise patterns in L^(R^) and derive an error bound in terms 



of the norm of the noise pattern. The influence of smoothing the reference and target patterns 
on the alignment error is discussed in Section [4. 3[ 

Throughout Section 4 we use the notations (•) and (•) to refer respectively to upper and lower 
bounds on the variable (^) . The parameters corresponding to smoothed patterns are written as 

^The validity of this approximation is confirmed by the numerical simulation results in Section [H] 
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as in Section 3.3 



E = 



The notations j and C( ) are used to denote important upper bounds 
appearing in the main results, which are associated with the parameter in the subscript. 

4.1 Derivation of an Upper Bound for Alignment Error for Gaussian 
Noise 

We consider the noiseless reference pattern p in Q and a target pattern that is a noisy obser- 
vation of a translated version of p. We assume an analytical noise model given by 

L 

(28) 

1=1 

where the noise units {X) are Gaussian atoms of scale e. The coefficients a-nd the noise 
atom parameters ^; are assumed to be independent. The noise atoms are of the form (/)^, [X) — 
4){E-^{X - 6i)) where 

e 1 c _ r 4,/ 
e J ' ' [ 5y^i 

The vector 5i is the random translation parameter of the noise atom such that the random 
variables {5x,i]iLi:{5y,i]f^i ^ C/[— 6, 6] have an i.i.d. uniform distribution. Here, 6 is a fixed 
parameter used to define a region [— &, b] x [—6, b] C IR^ in the image plane, which is considered 
as a support region capturing a substantial part of the energy of reference and target images. 
The centers of the noise atoms are assumed to be uniformly distributed in this region. In order 
to have a realistic noise model, the number of noise units L ^ K is considered to be a very 
large number and the scale e > of noise atoms is very small. The parameters L and e will be 
treated as noise model constants throughout the analysis. The coefficients C,i ~ N{0,i]^) of the 
noise atoms are assumed to be i.i.d. with a normal distribution of variance rj^. 

The continuous-space noise model w{X) is chosen in analogy with the digital i.i.d. Gaussian 
noise in the discrete space R". The single isotropic scale parameter e of noise units bears 
resemblance to the 1-pixel support of digital noise units. The uniform distribution of the position 
Si of noise units is similar to the way digital noise is defined on a uniform pixel grid. The noise 
coefficients Q have an i.i.d. normal distribution as in the digital case. If our noise model w{X) 
has to approximate the digital Gaussian noise in a continuous setting, the noise atom scale e is 
chosen comparable to the pixel width and L corresponds to the resolution of the discrete image. 

Let now p„ be a noisy observation of p such that pn{X) = p{X) + w{X), where w and p 



are independent according to the noise model (28). We assume that the target pattern is a 



translated version of Pn{X) so that it takes the form Pn{X — tT). Then, the noisy distance 
function between p{X) and PniX — tT) is given by 

g{tT)= j {p{X) - p^{X - tT)f dX = [ {p{X)-p{X^tT)-w{X-tT)fdX. (29) 

This can be written as g{tT) = f{tT) + h{tT), where 

h{tT) -2 / {p{X) - p{X - tT))w{X - tT) dX + [ w^{X - tT) dX. (30) 

The function h represents the deviation of g from /. We call h the distance deviation function. 
We show in Appendix |B . 1 1 that the expected value of h is independent of the translation tT and 
given by E[h{tT)] — ^Life^ where E[.] denotes the expectation. Therefore, E[g{tT)] — 

f{tT) + fill and the global minimum of E[g{tT)] is at tT = 0. However, due to the probabilistic 
perturbation caused by the noise w, the global minimum of g is not at tT = in general. We 
consider g to have a single global minimum and denote its location by ^o^o- However, the single 
global minimum assumption is not a strict hypothesis of our analysis technique; i.e., the upper 
bound that we derive for the distance between tgTo and the origin is still valid if g has more 
than one global minimum. In this case, the obtained upper bound is valid for all global minima. 
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We now continue with the derivation of a probabihstic upper bound on the distance to 
between the location toTo of the global minimum of g and the location of the global minimum 
of /. We show in Appendix |B .2| that to satisfies the equation 



d'fitTo) 



+ 



d^h{tTo) 



df^ 



\h{Q) - h{taTo)\ 



(31) 



t=ti , 



for some ti G [0,to] and t2 G [0,to]. Our derivation of an upper bound for tg will be based on 
(31 1. The above equation shows that can be upper bounded by finding a lower bound on the 



term 



df^ 



dt^ 



d'^hitTo) 



dt^ 



(32) 



t=ti 



and an upper bound for the term \h(0) — h{t()To)\. However, ft, is a probabilistic function; i.e., 
h{tT) and its derivatives are random variables. Therefore, the upper bound that we will obtain 
for to is a probabilistic bound given in terms of the variances of /i(0)-/i(toJo) and d^ h{tTQ) / dt^ . 

In the rest of this section, we proceed as follows. First, in order to be able to bound 
|/i(0) — h{tT)\ probabilistically, we present an upper bound on the variance of /i(0) — h{tT) in 
Lemma [l] which is then put in a more useful form in Corolla ry [T] by removing the dependence 
of the bound on tT. Next, in order to bound the term in ( |32[ ), we state a lower bound for 
d^f{tT)/dt^ in Lemma [2] We then present an upper bound for the variance of d^h{tT)/dP 
in Lemma [3] and generalize this bound to arbitrary tT vectors in Corollary 2] These results 
are finally put together in the main result of this section, namely Theorem [3 where an upper 
bound on to is obtained based on (|31[ ). Theorem |3] applies Chebyshev's inequality to employ 
the bounds derived in Corollaries [Ij and [2] to define probabilistic upper bounds on the terms 
|/i(0) - h{toTo)\ and \d'^h{tTo)/dt'^\. Then, this is combined with the bound on d'^J{tT)/dt^ in 



Lemma [2] to obtain a probabilistic upper bound on tp from the relation (31 1. 

We begin with bounding the variance of term h{Q) ~ h{tT) in order to find an upper bound 
for the right hand side of (31). Let us denote 



Ah{tT) := /i(0) - h{tT). 



From (|30|, Ah{tT) is given by 



A/i(tT) = h{0) - h[tT) 



2 / {p{X) - p{X - tT)) w{X - tT)dX 

/|R2 



where we have used the fact that J^2 w'^{X — tT)dX = w'^{X)dX. Let cr^/j^jy) denote the 
variance of AhitT). In the following lemma, we state an upper bound on cr^/jj-jj^-) • Let us define 



beforehand the following constants for the fc-th atom of p 



T^Wk\\E\ 



(33) 
(34) 



^/Wi + E^\ 

Also, let J~ and J"*" denote the set of (j, k) indices with negative and positive coefficient products 

J- ={(j,fc) : AjAfc <0} 
J+ = {(j,A:):AjAfc>0}. 

Lemma 1. An upper hound Ra^^ ^ on the variance c^^j-tT^-) of Ah(tT) is given by 



(35) 



'^Ah{tT) 



- ^'^lH(tT) "^^^^ { H AjKjAfcKfeCjfe+ >^]l^l >^kKk ^jkj 



(36) 



where the terms €jk and (djk depend on t, T , and the atom parameters of p. In particular, €jk and 
djfc are bounded functions oft and T that are obtained in terms of exponentials of second-degree 
polynomials of t and T with negative leading coefficients. 
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The proof of Lcnima[T]is given in Appendix B.3 The upper bound on the variance of Ah{tT) 



given in Lemma [T] has a dependence on the amount t and the direction T of the translation 
since the terms Cjk and <djk depend on t and T. In the derivation of a bound for to, the direction 
Tq of the global minimum of g will however be treated as an arbitrary and unknown unit-norm 
vector. For this reason, we would like to generalize the result of the lemma such that it does not 
depend on the direction T. Moreover, considering the complicated dependence of i?_2 on t 

Ah(tT) 

seen in the proof of Lemma 1 we would like to have a bound on cr\i^(^irp^ that is independent of 



t as well, since this would make the estimation of Iq using (31 ) easier. In the following corollary, 
we build on Lemma jlj and give a uniform upper bound on 0^,^^^,^^ over the closed ball of radius 
?o > 0, i?t(,(0) = {tT : T e S^,0 < t < to}. The upper bound on a'^mrp.^ is thus independent 
of tT and valid for all tT vectors in B-j:^^{0). Here the parameter to is considered to be a known 
threshold for to, such that to < ^o- This parameter will be assigned a specific value in Theorem 

El 

Corollary 1. Let to > 0, and let tT e Bj^{0). Then, the variance cr^f^^^^rp-^ of Ah{tT) can be 
upper bounded as 

^AhitT) < R^l^ — c^l^^f (37) 



where 



^ Ah 



Cc,2 :=4L ( > XjKj XkKk <Cjk + > X^KjXkKk 



/ , ^yjnj y\ki^k ^]k ~r ^ /xki^k ^jk 

U,k)eJ+ U.k)eJ- 



Here the terms Cjk and d^j. are constants depending on to and the atom parameters of p. In 
particular, Cjk and d^j. are bounded functions of to, given in terms of exponentials of second- 
degree polynomials of to with negative leading coefficients. 

The proof of Corollary [T] is presented in Appendix |B .41 In the proof a uniform upper bound 
€jk and a uniform lower bound d^j. are derived respectively for the parameters Cjk, dj^; and 
then the result of Lemma [T] is used. 

We have thus stated a uniform upper bound ^o-^^ for the variance of Ah{tT) which will be 
used to derive an upper bound for the right hand side of (31 1 in Theorem [3j We now continue 



with the examination of the left hand side of (31 1. We begin with the term d^f{tT)/dt^. The 



following lemma gives a lower bound on the second derivative of the noiseless distance function 
f{tT) in terms of the pattern parameters. 

Lemma 2. The second derivative of f{tT) along the direction T can be uniformly lower bounded 
for all t G [0, to] and for all directions T £ as follows 

—'^>ro + r^tl + rstl (38) 

Here rg > 0, < 0, and r3 < are constants depending on the atom parameters of p. 
In particular, Tq, rj, r^ are obtained from the eigenvalues of some matrices derived from the 
parameters Xj, tj, Qjk, '^jk- 

The proof of Lemma [2] is given in Appendix |B.5[ The above lower bound on the second 
derivative of f{tT) is independent of the direction T and the amount t of translation, provided 
that t is in the interval [0,io]- In fact, the statement of Lemma ^ is general in the sense that 
to can be any positive scalar. However, in the proof of Theorem [3] we use Lemma [2] for the to 
value that represents the deviation between the global minima of / and g. 

The result of Lemma [2] will be used in Theorem [3] in order to lower bound the second 



derivative of / in (31|. We now continue with the term d'^h{tT)/dt^ in (31). Let h"{tT) := 
d^h{tT) / dt^ denote the second derivative of the deviation function h along the direction T. 
Since h"{tT) can take both positive and negative values, in the calculation of a lower bound 
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for the term (32), we need a bound on the magnitude \h"{tT)\ of this term. It can be bounded 
probabiUstically in terms of the variance of h"{tT). We thus state the following result on the 
variance of h"{tT). 

Lemma 3. Let denote the variance ofh"(tT). af^ui^^rpj can be upper bounded as 

^ij:k)eJ+ ij,k)eJ- 

where the terms ejk and fj^ depend on t, T, and the atom parameters of p. In particular, Sjk 
and fj-fc are bounded functions oft and T that are derived in terms of polynomial functions oft, 
T and exponentials of second-degree polynomials of t and T with negative leading coefficients. 

The proof of Lemma [3] is given in Appendix |B.6| Lemma [3] defines an upper bound on 
'^'h"(tT) thf^t depends on the amount t and direction T of the translation between the reference 
and target patterns. The following corollary generalizes the result of of Lemma |3] in order to 
define a uniform bound R^'^^^ for cr^//(jji) over i?j^(0), where R^^^^ is independent of t and T. 

Corollary 2. Let to > 0, and let tT £ B^^{0). Then, the variance cr^/z^tT^) of h"{tT) can be 
upper bounded as 



^l"itT)<Ral„ -=^^1,^ (39) 



where 



^'^l" ^ XjXkKjKk <Bjk + Y, ^J^kKjKk Ijk 

^U.k)eJ+ {3.k)e.J- 

Here ejk is a constant depending on the atom parameters of p; and the term f^j, depends on the 
atom parameters of p and to- particular, ejk is given in terms of rational functions of the 
eigenvalues of ^k matrices; and Ijk is a bounded function of to given in terms of exponentials 
of second-degree polynomials of to with negative leading coefficients. 

The proof of Corollary [2] is given in Appendix |B.7| The proof derives respective uniform 
upper and lower bounds ejk, Ijk for the parameters ejk, IFjfe in Lemma [3] and applies the result 
of Lemma [Sj 

Now we are ready to give our main result about the bound on the alignment error. The 
following theorem states an upper bound on the distance between the locations of the global 
minima of / and g in terms of the noise standard deviation rj and the atom parameters of p, 
provided that tj is smaller a threshold 770. The threshold tjq is obtained from the bounds derived 
in Corollaries [1] [2] and Lemma [2] such that the condition rj < rjo guarantees that the assumption 
to < to holds. In the theorem, the parameter which is treated as a predefined threshold on 
to in the previous lemma and corollaries, is also assigned a specific value in terms the constants 
Tq , I 1^3 of Lemma [2j 

Theorem 3. Let 



2\r,\+2V\l/'\r^\V^ 



(40) 



Let i?o-Ah Y^fih '^^'^ ^<^h" Y^o"2„; where Ra\^ '"^'^ ^o-^,, '^'"^ '^^ defined in (37) and 
{39), and evaluated at the value of to given above. Also, let c^^^ ■— and c^^,, := ^c„2 . 

Assume that for some s > the noise standard deviation rj is smaller than rjo such that 



^0 Ho 



v<m--=- '-^2 • (41) 
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Then, with probability at least 1— , the distance to between the location of the global minimum 
of f and the location taTo of the global minimum of g is bounded as 



The proof of Theorem [3] is given in Appendix |B.8| In the proof, we make use of the upper 

2 2 ^ ^ I I 

Ah{tT)' ^h"{tT)^ 



bounds i?o-2 , R^2 on <y\i^ifj'\, '^h"(tT)' ^^'^ lower bound on d^f(tT)/dt^ given in (38). 



The upper bound Rtg in (|42|) shows that the ahgnment error increases with the increase inThe 
noise level, since R^^^ ^<^h" linearly proportional to the noise standard deviation rj. 



The increase of the error with the noise is expected. It can also be seen from (42 1 that the 
increase in the term Tq, which is proportional to the second derivative of the noiseless distance 
/, reduces the alignment error; whereas an increase in the term Ra,^,, , which is related to the 
second derivative of h, increases the error. This can be explained as follows. If / has a sharp 
increase around its global minimum at 0, i.e., / has a large second derivative, the location of its 
minimum is less affected by h. Likewise, if the distance deviation function h has a large second 
derivative, it introduces a larger alteration around the global minimum of /, which causes a 
bigger perturbation on the position of the minimum. 

Theorem |3] states a bound on to under the condition that the noise standard deviation 77 
is smaller than the threshold value rjo, which depends on the pattern parameters (through the 
terms Tq, rg, to) as well as the noise parameters L and e (through the terms Co-^^ and Co-^,,). 
The threshold rjo thus defines an admissible noise level such that the change in the location of 
the global minimum of / can be properly upper bounded. This admissible noise level is derived 
from the condition Rtg < which is partially due to our proof technique. However, we remark 
that the existence of such a threshold is intuitive in the sense that it states a limit on the noise 
power in comparison with the signal power. Note also that the denominator — s ^a-^,, of Rto 
should be positive, which also yields a condition on the noise level 



However, this condition is already satisfied due to the hypothesis rj < rjo of the theorem, since 



770 < 77o from (41) 



4.2 Generalization of the Alignment Error Bound to Arbitrary Noise 
Models 

Here, we generalize the results of the previous section in order to derive an alignment error 
bound for arbitrary noise patterns. In general, the characteristics of the noise pattern vary 
depending on the imaging application. In particular, while the noise pattern may have high 
correlation with the reference pattern in some applications (e.g., noise resulting from geometric 
deformations of the pattern), its correlation with the reference pattern may be small in some 
other settings where the noise stems from a source that does not depend on the image. We thus 
focus on two different scenarios. In the first and general setting, we do not make any assumption 
on the noise characteristics and bound the alignment error in terms of the norm of the noise 
pattern. Then, in the second setting, we consider that the noise pattern has small correlation 
with the points on the translation manifold of the reference pattern and show that the alignment 
error bound can be made sharper in this case. 

We assume that the reference pattern p{X) is noiseless and we write the target pattern as 
Pg{X — tT), where = p{X) + z{X) is a generalized noisy observation of p such that z e L^(IR^) 
is an arbitrary noise pattern. Then, the generalized noisy distance function is 

gg{tT)^ [ {p{X)-pg{X~tT)fdX 
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and the generalized deviation function is hg = ggitT) — f{tT). Let us call uqUq the point where 
Qg has its global minimum. Then the distance between the global minima of and / is given 
by Mo- 

We begin with the first setting and state a generic bound for the alignment error uq in terms 
of the norm of the noise 

\ 1/2 

1/ := I / z^{X)dX . (43) 



IR2 



We also denote the norm of p by 



Rp :— 



p^{X)dXj 



1/2 



(44) 



We will use the following lemma in our main result, which states a uniform upper bound on 
the norm of the second derivative of p{X + tT) with respect to t that is valid for all t and T. 

Lemma 4. The norm of the second derivative of p{X + tT) along any direction T can be 
uniformly upper bounded as 



d^p{X + tT) 

[R2 \ dt 



-,1/2 



dX 



L.\^ 74- I A J- / 



1/2 



(45) 



where gjk and hjk are constants depending on the atom parameters of p. The constants gjk and 
hjk are given in terms of the parameters Qjk and the square roots of rational functions of the 
scale parameters Ux.k, <^y,k of the atoms of p. 



The proof of Lemma |4] is given in Appendix |C.1[ 

We are now ready to state our generalized alignment error result for arbitrary noise patterns. 



Theorem 4. Let to be defined as in (40). Assume that the norm v of z is smaller than such 
that 



(46) 



SRp -\- 2Rp"tQ 

where rg is the constant in Lemma^ Then, the distance uq between the global minima of f and 
Qg is bounded as 



Uq < Run '■ = 



SRpiy 
- 2R„„v' 



(47) 



Theorem |4] is proven in Appendix |C.2[ The theorem states an upper bound on the alignment 
error for the general case where the only information used about the noise pattern is its norm. 
The alignment error bound R^o is a generalized and deterministic version of the probabilistic 
bound Rta derived for the Gaussian noise model. In the proof of the theorem, the change 
hg{Q) — hg{uoUo) in the deviation function is bounded by ARpV. The second derivative of the 
noiseless distance function / is captured by Tq as in the Gaussian noise case. Finally, the result 
of Lemma [4] is used to derive the bound 2i?p"^ for the second derivative of the deviation hg. 
Based on these, the stated result is then obtained by following similar steps as in Section [4T| 

We now continue with the second setting where the noise pattern z has small correlation 
with the points on the translation manifold M.{p) of p. We characterize the correlation of 
two patterns with their inner product. Assume that a uniform correlation upper bound rpz is 
available such that 



p{X + tT)z{X)dX 



< r„ 



(48) 



for all t and T. The following corollary builds on Theorem |4] and states that the bound on the 
alignment error can be made sharper if the correlation bound is sufficiently small. 
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Corollary 3. Let to be defined as in (4-0) and let a uniform upper hound rp^ for the correlation 

—2 

be given such that rp^ < tQr^/8. 



Assume that the norm v of z is smaller than Vq such that 



tlr, 



0^0 '^fpz 



2Rp"tQ 

Then, the distance uq between the global minima of f and gg is bounded as 



uo < Quo ■= 



Sr„ 



(49) 



(50) 



The proof of Corollary |3] is given in Appendix |C.3| One can observe that the alignment 
error bound Qug approaches zero as the uniform correlation bound approaches zero. Therefore, 
if rpz is sufficiently small, Qua will be smaller than the general bound This shows that, 

regardless of the noise level, the alignment error is close to zero if the noise pattern z is almost 
orthogonal to the translation manifold M (p) of the reference pattern. 

4.3 Influence of Filtering on Alignment Error 

In this section, we examine how the alignment error resulting from the image noise is affected 
when the reference and targe t pa tterns are low-pass filtered. We consider the Gaussian kernel 

and analyze the dependence of the alignment error bounds 



:^(j)p{X) defined in Section 3.3 
obtained for the Gaussian noise 



and generalized noise models in Sections 4.1 and 4.2 on the 
filter size p and the noise level parameters rj and i^. 

We begin with the Gaussian noise model w{X). The filtered reference pattern p{X) and the 
filtered noisy observation pn{X) of the reference pattern are given by 

K 



p{X) - ^Afc0^,(X) 

k=l 

K L 

Pn(X) = p{X) + w{X)=Y,hci^^,{X) + Y,Ci<ly^^{X). 



fc=i 



1=1 



Remember from Section 3.3 that the rotation and translation parameters of the atoms of 



p{X) do not depend on p\ and the scale matrices vary with p such that af. = + A^. The 
parameters of the smoothed noise atoms can be obtained similarly to the parameters of the 
atoms inp; i.e., 4>^^iX) = 4>{E~^{X — Si)) , where ~ E'^ + . This gives the scale parameter 
of smoothed noise atoms, which is written as 



The smoothed noise coefficients are then given by 



\E\ 







|£;r- e2 + p2 



(51) 



(52) 



Since all the coefficients C; are multiplied by a factor of e^/ (e^ + p^), the variance of the smoothed 
noise atom coefficients is 

.2 X 2 



1 



2 I 9 ; ^ 
e"^ + p2 



(53) 



As the noise atom units are considered to have very small scale, one can assume first that 



p ^ e for typical values of the filter size p. Then the relations in (51) and (53) give the joint 
variations of e and fj with rj and p as 



0(P), 



V = 0{t]p 2). 



(54) 
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We first state the following lemma, which summarizes how the parameters Tq and R^^^ used 
in the alignment error bound vary with p and rj. 

Lemma 5. The smoothed value Tq of the parameter Tq associated with the second derivative of 
f has the variation 

ro = 0((l + pr^) 

with p. Also, the smoothed value Ra^^ '^f uniform upper bound Ra^^ '^'^ standard 
deviation of Ah(tT) has the joint variation 

Ra^, ^ O {tJ p-') 

with r] and p. 

The proof of Lemma [5] is given in Appendix |D.1[ 

Now, we are ready to state the dependence of the bound Rtg on p and rj in the following 
main result. 

Theorem 5. The joint variation of the alignment error bound Rt^ for the smoothed image pair 
with respect to rj and p is given by 

Therefore, for a fixed noise level, Rtg increases at a rate ofO i^p^^^ (1 — p)^^^^) with the increase 
in the filter size p. Similarly, for a fixed filter size, the rate of increase of Rtg with the noise 
standard deviation 7] is O (^r]^/^ (1 — 77)"^/^). 

The proof of Theorem [s] is presented in Appendix 
variation of 0{r] p~^) with rj and p. Then, combining this with the results of Lemma [5| yields 
the stated result for the alignment error bound. 

Theorem [5] constitutes the summary of our analysis about the effect of filtering on the 
alignment accuracy for the Gaussian noise model. While the aggravation of the alignment error 
with the increase in the noise level is an intuitive result, the theorem states that filtering the 
patterns under the presence of noise decreases the accuracy of alignment as well. Remember 
that this is not the case for noiseless patterns. The result of the theorem can be interpreted as 
follows. Smoothing the reference and target patterns diffuses the perturbation on the distance 
function, which is likely to cause a bigger shift in the minimum of the distance function and 
hence reduce the accuracy of alignment. The estimation Rtg = O (p^^^ (1 — p)~^^^) of the 
alignment error suggests that the dependence of the error on p is between linear and quadratic 
for small values of p, whereas it starts to increase more dramatically when p takes larger values. 
Similarly, Rtg is proportional to the square root of 77 for small 77 and it increases at a sharper 
rate as 77 grows. 

Next, we look at the variation of the bounds R^g and Q^^g for arbitrary noise patterns, which 
are respectively obtained for the general and small-correlation cases. We present the following 
theorem, which is the counterpart of Theorem [5] for arbitrary noise models. 

Theorem 6. The alignment error bounds R^g and Q^g for arbitrary noise patterns have a 
variation of 

with the noise level v and the filter size p. Therefore, for a fixed noise level, the errors Rug and 
Qug increase at a rate of O ((1 + p^)^^^) with the increase in the filter size p. Similarly, for a 
fixed filter size, R^g o,nd Q^g increase at a rate of O (^^^/^{l — 7/)^"'^/^) with respect to the noise 
norm v. 



D.2 



We first show that Rn, ,, has a 
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The proof of Theorem[6]is given in Appendix |E.l[ The dependence of the generahzed bounds 
Ru„ and Quo on the noise norm v is the same as the dependence of Rtg on i]. However, the 
variation of and Quo with p is seen to be shghtly different from that of Rt^ . This stems from 
the difference between the two models. In the generalized noise model z, we have treated the 
norm v of z as a known fixed number and we have characterized the alignment error in terms 
of V. On the other hand, it; is a probabilistic Gaussian noise model; therefore, it is not possible 
to bound its norm with a fixed parameter. For this reason, the alignment error for w has been 
derived probabilistically in terms of the standard deviations of the involved parameters. Since 
the filter size p affects the norm of z and the standard deviations of the terms related to w in 
different ways, it has a different effect on these two type of alignment error bounds. The reason 
why the two error bounds have the same kind of dependence on the noise level parameters rj and 
V can be explained similarly. The standard deviations of the terms related to w have a simple 
linear dependence on rj, which is the same as the dependence of the counterparts of these terms 
in the generalized model on i'. 



5 Experiments 

5.1 Evaluation of Alignment Regularity Analysis 

We first evaluate our theoretical results about SIDEN estimation with an experiment that 
compares the estimated SIDEN to the true SIDEN. We generate a reference pattern consisting 
of 40 randomly selected Gaussian atoms with random coefhcients, and choose a random unit 
direction T for pattern displacement. Then, we determine the distance ujj, of the true SIDEN 
boundary from the origin along T, and compare it to its estimation St for a range of filter sizes 
p (With an abuse of notation, the parameter denoted as ujt here corresponds in fact to sup oJt 
in the definition of SIDEN in ([t])). The distance uJt is computed by searching the first zero- 
crossing of df{tT) I dt numerically, while its estimate 5t is computed according to Theorem [l] 
We repeat the experiment 300 times with different random reference patterns p and directions 
T and average the results of the cases where df{tT)/dt has zero-crossings for all values of p 
(i.e., 56% of the tested cases). The distance Cj^ and its estimation St are plotted in Figure [2j 
The figure shows that St has an approximately linear dependence on p. This is an expected 
behavior, since St = O ((1 + p^Y^"^) ~ 0{p) for large p. The estimate St is smaller than Cjt 
since it is a lower bound for uj^- Its variation with p is seen to capture well the relative variations 
of the true SIDEN boundary Cjt with p. 



5.2 Evaluation of Alignment Accuracy Analysis 

We now present experimental results evaluating the alignment error bounds derived in Section|4j 
We conduct the experiments on reference and target patterns made up of Gaussian atoms, where 
the target pattern is generated by corrupting the reference pattern with noise and applying a 
random translation tT. In all experiments, an estimation te^e oitT is computed by aligning the 
reference and target images with a gradient descent algorithnj^ which gives the experimental 
alignment error as \\tT — te^ell- The experimental error is then compared to the theoretical 
bounds derived in Section ID 



^In the computation of ieTe, in order to be able to handle large translations, before the optimization with gradient 
descent we first do a coarse preregistration of the reference and target images with a search on a coarse grid in the 



translation parameter domain, whose construction is explained in Section 5.3 
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Kernel size (p) 



Figure 2: The variations of the true distance Cjt of the boundary of S to the origin and its estimation 
5t with respect to the filter size 



5.2.1 Gaussian noise model 

In the first set of experiments, we evaluate the results for the Gaussian noise model. We 
compare the experimental alignment error to the theoretical bound given in Theorem [aj In ah 
experiments, the parameter s in Theorem [Sj which controls the probability, is chosen such that 

< Rtg holds with probability greater than 0.5. For each reference pattern, the experiment is 
repeated for a range of values for noise variances rj^ and filter sizes p. The maximum value of 
the noise standard deviation is taken as the admissible noise level rjo in Theorem [3j 

We first experiment on reference patterns built with 20 Gaussian atoms with randomly 
chosen parameters. The atom coefficients Xk in the reference patterns are drawn from a uniform 
distribution in [—1,1]; and the position and scale parameters of the atoms are selected such 
that Tj:,Ty e [—4,4] and crx,cry e [0.3,2]. The noise model parameters are set as L = 750, 
e = 0.1. The experiment is repeated on 50 different reference patterns. Then, 50 noisy target 
patterns are generated for each reference pattern according to the Gaussian noise model w in 



(281 with a random translation tT in the range tTx,tTy G [—4, 4]. The results are averaged over 
all reference and target patterns. In Figure [3) the experimental and theoretical values of the 
alignment error are plotted with respect to the filter size p, where different curves correspond to 
different values. Figures 3(a) and |3(b)| show respectively the experimental value \\tT — teTe 



and the theoretical upper bound Rtg of the alignment error. Figure [4] shows the same results, 
where the error is given as a function of rj. The experimental values and the theoretical bounds 
are given respectively in Figures [4(iJ1and[4(b)| 



The results in Figure [S] show that, although the theoretical upper bound is pessimistic (which 
is due to the fact that the bound is a worst-case analysis), the variation of the experimental 
value of the alignment error as a function of the filter size is in agreement with that of the 
theoretical bound. The experimental behavior of the error conforms to the theoretical prediction 
Rto ~ O (p^/^ (1 — p)'^^^) of Theoremji] Next, the plots of Figure li] suggest that the variation 
of the theoretical bound Rtg as a function of rj is consistent with the result of Theorem [5j 
which can be approximated as Rt„ ~ 0{y/rf) for small values of 77. On the other hand, the 
experimental value of the alignment error seems to exhibit a more linear behavior. However, 
this type of dependence is not completely unexpected. Theorem [5] predicts that Rt^^ is of 0{y/r]) 



*The bound R 2 given in Corollary I2I is derived from the preliminary bound R 2 in Lemma \M In the 
h" \J h"(tr) □ 

implementation of Theorem 31 in order to obtain a sharper estimate of R^2 , we compute it by searching the 
maximum value of o-L/ftD over t and T from the expressions for £j and J-', used in the derivation of R 2 

^ ' h"(tT) 
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(a) 




(b) 



Figure 3: Alignment error of random patterns as a function of filter size p. 



for small 77; and O (77^/^(1 — t])^^''^) for large 77, while the experimental value of the error can 
be rather described as \\tT — tJ'^W — 0{rf), which is between these two orders of variation. In 
order to examine the dependence of the error on -q in more detail, we have repeated the same 
experiments with much higher values of rj. The experimental alignment error is given in Figure 
[sj where the error is plotted with respect to the noise standard deviation in Figure 5(a) and the 
filter size in Figure [5(b)| The results show that, at high noise levels, the variation of the error 
with T] indeed increases above the linear rate 0{rj). The noise levels tested in this high-noise 
experiment are beyond the admissible noise level derived in Theorem |3] therefore, we cannot 
apply Theorem [3] directly in this experiment. However, in view of Theorem [5| which states 
that the error is of O (77^/^(1 — 77)"^/^), these results can be interpreted to provide a numerical 
justification of our theoretical finding: at relatively high noise levels, the error is expected to 
increase with 77 at a sharply increasing rational function rate above the linear rate. The variation 
of the error with p at high noise levels plotted in Figure [5(b)| is seen to be similar to that of the 
previous experiments. 

We now evaluate our alignment accuracy results under Gaussian noise on face and digit 
images. First, the reference face pattern is obtained by approximating the face image shown 
in Figure |6(a)| with 50 Gaussian atoms. The average atom coefficient magnitude of the face 
pattern is |A| = 0.14, and the position and scale parameters of the pattern are in the range 
[—0.9,0.9] for Tx,Ty, and [0.04, 1.1] for ax,(Jy. For the digit experiments, the reference pattern 
shown in Figure 8(a) is the approximation of a handwritten "5" digit with 20 Gaussian atoms. 
The pattern parameters are such that the average atom coefficient magnitude is |A| = 0.87, and 
the position and scale parameters are given by Tx,Ty G [—0.7,0.7], and (Jx,cry € [0.05,1.23]. 
The range of translation values tT^ and tTy is selected as [—1, 1] for both settings. In both 
settings, two different noise models are tested. First, the target patterns are corrupted with 
respect to the analytical Gaussian noise model w of (281, where the noise parameters are set 
as L = 750, e = 0.04. Then, a digital Gaussian noise model is tested, where the pixels in the 
discrete representation of the images are corrupted with additive i.i.d. Gaussian noise having 
the same standard deviation rj as w. The digital Gaussian noise model is supposed to be well- 
approximated by the analytical noise model. Again, 50 target patterns are generated with 
random translations. The alignment errors are plotted with respect to p in Figures |6] and |8] 
respectively for the face and digit patterns. The experimental error with the analytical noise 
model, the theoretical upper bound obtained for the analytical noise model, and the experimental 
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(a) (b) 
Figure 4: Alignment error of random patterns as a function of noise standard deviation rj. 




Figure 5: Alignment error of random patterns as functions of the noise standard deviation and the 
filter size, at high noise levels. 
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Figure 7: Alignment error of face pattern as a function of noise standard deviation 77. 
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error with the digital noise model are plotted respectively in (b), (c) and (d) in both figures. 
The same errors are also plotted with respect to rj in Figures [7] and [9] The results are averaged 
over all target patterns. 

The plots in Figures [6] and [8] show that the experimental and theoretical errors have a similar 
variation with respect to p. The dependence of the error on p in these experiments seems to be 
difi^erent from that of the previous experiment of Figure [3j Although the theory predicts the 
variation Rto = O [p^^"^ (1 — p)~^/^), this result is average and approximate. The exact variation 
of the error with p may change between different individual patterns, as the constants of the 
variation function are determined by the actual pattern parameters. The plots of Figures [7] and 
|9] can be interpreted similarly. The similarity between the plots for the analytical and digital 
noise models suggests that the noise model w used in this study provides a good approximation 
for the digital Gaussian noise, which is often encountered in digital imaging applications. 
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Figure 8: Digit pattern and alignment error as a function of filter size p. 
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Figure 9: Alignment error of digit pattern as a function of noise standard deviation r]. 
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5.2.2 Generic noise model 



In the second set of experiments we evaluate the results of Theorem |4] and Corollary |3] for the 
generic noise model z. In each experiment, the target patterns are generated by corrupting the 
reference pattern p with a noise pattern z and by applying random translations. In order to study 
the effect of the correlation between z and the points on M. {p) on the actual alignment error and 
on its theoretical bound, we consider two different settings. In the first setting, the noise pattern 
z is chosen as a pattern that has high correlation with p. In particular, z is constructed with a 
subset of the atoms used in p with the same coefficients. The general bound Ru„ is used in this 
setting. In the second setting, the noise z is constructed with randomly selected Gaussian atoms 
so that it has small correlation with p. The bound Qug for the small correlation case is used 



in the second setting, where the correlation parameter rpz in ( 48 1 is computed numerically for 
obtaining the theoretical error bound. In both cases, the atom coefficients of z are normalized 
such that the norm of 2 is below the admissible noise level vq. The theoretical boundfj^are 
then compared to the experimental errors for different values of the filter size p and the noise 
level ly. 

We conduct the experiment on the random patterns used above, in Figures [3][5) The noise 



pattern z is constructed with 10 atoms. The alignment errors are plotted in Figures 10 and 11 
as a function of the filter size p and the noise level respectively. The results are averaged over 
50 reference patterns, and 50 target patterns for each reference pattern, which are created with 
random translations in the range tT^^tTy e [—4, 4]. In both figures, the experimental alignment 
error and its theoretical bound are plotted in (a) and (b) for the first setting with high correlation 



and in (c) and (d) for the second setting with small correlation. The plots in Figure 10 show 
that the behavior of the theoretical upper bounds with respect to the increase in the filter size 
fits well the behavior of the actual error in both settings. The results are in accordance with 
Theorem^ which states that and Quo ^re of 0((1 + p'^Y^^). Next, looking at the plots in 
Figure [11 [ one can observe that the availability of an upper bound on the correlation between 



z and the points on Ai{p) has the following benefits. First, comparing Figures [Tl(b)] and [TT(d)1 



we see that, when a bound r^^ on the correlation is known, the admissible noise level increases 
significantly (from around i^q = 0.01 to i'q ~ 0.28). Moreover, the comparison of the theoretical 
upper bounds and Qu,, with the actual errors shows that Quo is less pessimistic than R^o 
since it makes use of the information of the maximum correlation. The comparison of the slopes 
of the experimental alignment errors in Figures ll(a)| and 11(c) shows that, at the same noise 



level, the error is slightly smaller when z has small correlation with the points on M{p). One can 
also observe from Figure [TT] that the variation of the alignment error with v bears resemblance 
to its variation with ry observed in the previous experiment of Figure |4] This is an expected 
result, as it has been seen in Theorem [g] that the dependences of Ru„ and Q^o on ly are the same 
as the dependence of Rtg on 77. 

Then, we repeat the same experiment with the face and digit patterns of the previous 
experiment, z is constructed with 10 atoms in the face experiment and 5 atoms in the digit 
experiment. The errors obtained with the face pattern are plotted with respect to p and v in 
Figures [12] and [13] respectively; and the errors obtained with the digit pattern are presented 
in Figures [14] and [15] similarly. All results are averaged over 50 test patterns with random 
translations in the range tTx,tTy G [—1,1]. The results can be interpreted similarly to the 
experiment with random patterns and confirm its findings. 

The overall conclusion of the experiments is that increasing the filter kernel size results in 
a bigger alignment error when the target image deviates from the translation manifold of the 
reference image due to noise. The results show also that the theoretical bounds for the alignment 
error capture well the order of dependence of the actual error on the noise level and the filter 
size, for both the Gaussian noise model and the generalized noise model. Also, the knowledge 
of the correlation between the noise pattern and the translated versions of the reference pattern 



We compute the bound for the second derivative of p numerically by minimizing over t and 

T. While the bound Rpii is useful for the theoretical analysis as it has an open-form expression, the numerically 
computed bound is sharper. 
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Figure 10: Alignment error for random patterns and generic noise, as a function of filter size p. 
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Figure 11: Alignment error for random patterns and generic noise, as a function of noise magnitude 
v. 
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igure 12: Alignment error for the face pattern and generic noise, as a function of filter size 
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;ure 13: Alignment error for the face pattern and generic noise, as a function of noise magnitude 
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Figure 14: Alignment error for the digit pattern and generic noise, as a function of filter size p. 
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gure 15: Alignment error for the digit pattern and generic noise, as a function of noise magnitude 
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is useful for improving the theoretical bound for the alignment error in the general setting. 



5.3 Application: Design of an optimal registration algorithm 

We now demonstrate the usage of our SIDEN estimate in the construction of a grid in the 
translation parameter domain that is used for image registration. In Section [3.2[ we have derived 
a set Q of translation vectors that can be correctly computed by minimizing the distance function 
with descent methods, where Q is a subset of the SIDEN S corresponding to the noiseless 
distance function. As discussed in the beginning of Section |4] in noisy settings, one can assume 
that Q is also a subset of the perturbed SIDEN corresponding to the noisy distance function, 
provided that the noise level is sufficiently small. Therefore the estimate Q can be used in 
the registration of both noiseless and noisy images; small translations that are inside Q can be 
recovered with gradient descent minimization. However, the perfect alignment guarantee is lost 
for relatively large translations that are outside Q and the descent method may terminate in a 
local minimum other than the global minimum. Hence, in order to overcome this problem, we 
propose to construct a grid in the translation parameter domain and estimate large translations 
with the help of the grid. In particular, we describe a grid design procedure such that any 
translation vector tT lies inside the SIDEN of at least one grid point. Such a grid guarantees 
the recovery of the translation parameters if the distance function is minimized with a gradient 
descent method that is initialized with the grid points. In order to have a perfect recovery 
guarantee, each one of the grid points must be tested. However, as this is computationally 
costly, we use the following two-stage optimization instead, which offers a good compromise 
with respect to the accuracy-complexity tradeoff. First, we search for the grid vector that gives 
the smallest distance between the image pair, which results in a coarse alignment. Then, we 
refine the alignment with a gradient descent method initialized with this grid vector. In practice, 
this method is quite likely to give the optimal solution, which has been the case in all of our 
simulations. 

We now explain the construction of the grid. First, notice from jsl that ajk(T) — ajk{—T) 



and bjk{T) = —bjk{—T). Therefore, the function Sjk(t) given in (25) is the same for T and 
—T by symmetry. As Qj^ does not depend on T, from the form of df{tT)/dt in (24 1 we have 
df{tT)/dt — df{—tT)/dt. Hence, the SIDEN is symmetric with respect to the origin. It is also 
easy to check that the estimation 5t of the SIDEN boundary along the direction T satisfies 
St — S-T- One can easily determine a grid unit in the form of a parallelogram that lies 
completely inside the estimate Q of the SIDEN and tile the {tT^ , tTy)-p\ane with these grid 
units. This defines a regular grid in the (tT^ , tTy)--pla,ne such that each point of the plane lies 
inside the SIDEN of at least one grid point. Note that the complexity of image registration 
based on a grid search is given by the number of grid points. In our case, the number of grid 
points is determined by the area of Q; and therefore, the alignment complexity depends on the 
well-behavedness of the distance function /. In particular, as V{Q) increases with the filter size, 
the area of the grid units expand at the rate 0{l + p^) and the number of grid points decrease at 
the rate O ((1 -I- p^)~^) with p. Therefore, the alignment complexity with the proposed method 
is of 0((l + p2)-i). 

The construction of a regular grid in this manner is demonstrated in Figure 16 for the 



image of the "5" digit used in the experiments of Section |5.2[ In Figure 16(a)[ the reference 



pattern and its translated versions corresponding to the neighboring grid points in the first and 
second directions of sampling are shown. In Figure |16(b)[ the reference pattern is shown when 
smoothed with a filter of size p = 0.15, as well as the neighboring patterns in the smoothed grid. 
The original grid and the smoothed grid for p — 0.15 are displayed in Figures [16(c)] and [l6(d)| 



where the estimation of the SIDEN and the grid units are also plotted. One can observe that 
smoothing the pattern is helpful for obtaining a coarser grid that reduces the computational 
complexity of image registration in hierarchical methods. The distance functions f(tT) and 



f{tT) of the original and smoothed digit patterns are plotted in Figure 17 The two plots 
show that smoothing eliminates undesired local extrema of the distance function and therefore 
expands the SIDEN. 
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Grid neighbor aiong 2nd direction Grid neighbor along 2nd direction 




(a) Reference digit pattern with 20 Gaussian (b) Smoothed digit pattern and the neighboring 
atoms and its translated versions corresponding to images on tlie smoothed grid 
neighboring grid points 




IT IT 

(c) Original grid (d) Smoothed grid 



ure 16: Construction of a regular grid in parameter domain with an exact alignment guarantee. 



Then, in order to demonstrate the relation between alignment complexity and filtering, we 
build multiresolution grids corresponding to different filter sizes and plot the variation of the 
number of grid points with the filter size. The results obtained with the random patterns and 
the face and digit patterns used in Section 5.2 are presented in Figure 18 The results show that 
the number of grid points decreases monotonically with the filter size, as predicted by Theorem 
which suggests that the number of grid points must be of O ((1 + p^)^^Y 
Finally, we remark that the performance guarantee of this two-stage registration approach 
is confirmed by the experiments of Section 5.2 which use this registration scheme. In the plots 
of Figures [3p5| where the coarse alignment in each experiment is done with the help of a grid 
adapted to the filter size using the grid design procedure explained above, we see that the 
proposed registration technique results in an alignment error of for the noiseless case (77 = 
or = 0) for all values of the filter size p. These alignment error results, together with the grid 
size plots of Figure |18[ show that increasing the filter size reduces the alignment complexity 
while retaining the perfect alignment guarantee in the noiseless case. Figures [SpS] show also 
that the proposed grid can be successfully used in noisy settings. The alignment error in these 
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(a) Original distance function fitT) 



(b) Smoothed distance function fitT) for p = 0.15 



Figure 17: The variation of the distance function with smoothing. 

experiments stems solely from the change in the global minimum of the distance function due to 
noise, and not from the grid; otherwise, we would observe much higher alignment errors that are 
comparable to the distance between neighboring grid points. This confirms that the estimate Q 
remains in the perturbed SIDEN and its usage does not lead to an additional alignment error 
if the noise level is relatively small. 

6 Discussion of Results 

The results of our analysis show that smoothing improves the regularity of alignment by increas- 
ing the range of translation values that are computable with descent-type methods. However, 
in the presence of noise, smoothing has a negative influence on the accuracy of alignment as it 
amplifies the alignment error caused by the image noise; and this increases with the increase 
in the filter size. Therefore, considering the computation cost - accuracy tradeoff, the optimal 
filter size in image alignment with descent methods must be chosen by taking into account the 
deviation between the target pattern and the translation manifold of the reference pattern; i.e., 
the expected noise level. 

Our study constitutes a theoretical justification of the principle behind hierarchical registra- 
tion techniques that use local optimizers. Coarse scales are favorable at the beginning of the 
alignment as they permit the computation of large translation amounts with low complexity 
using simple local optimizers; however, over-filtering decreases the accuracy of alignment as 
the target image is in general not exactly a translated version of the reference image. This is 
compensated for at finer scales where less filtering is applied, thus avoiding the amplification of 
the alignment error resulting from noise. Since images are already roughly registered at coarse 
scales, at fine scales the remaining increment to be added to the translation parameters for 
fine-tuning the alignment is small; it can be achieved at a relatively low complexity in a small 
search region. 

We now interpret the findings of our work in comparison with some previous results. We 
start with the article [13] by Robinson et al., which studies the Cramer-Rao lower bound (CRLB) 
of the registration. Since the CRLB is related to the inverse of the Fisher information matrix 
(FIM) J, the authors suggest to use the trace of as a general lower bound for the MSB of 
the translation estimation. Therefore, the square root of tr{J~^) can be considered as a lower 
bound for the alignment error. It has been shown in 113 ! that yJtr{ J^^) = 0{ri), where 77 is the 
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(a) Random patterns 



(b) Face pattern 
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(c) Digit pattern 



Figure 18: Number of grid points. The decay rate is of O ((1 + p^) 



standard deviation of the Gaussian noise. In fact, this result tells that the alignment error with 
any estimator is lower bounded by 0{ri), i.e., its dependence on the noise level is at least linear. 
Meanwhile, our study, which focuses on estimators that minimize the SSD with local optimizers, 
concludes that the alignment error is at most 0{y/r]/{l — rj)) for these estimators. Notice that for 
small r], 0(-y/ 77/(1 — 77)) « O(y^) > 0{r]); and for large 77, we still have 0{\/t]I (1 — 77)) > 0{r\) 
due to the sharply increasing rational function form of the bound. Therefore, the result in 
|13j and our results are consistent and complementary, pointing together to the fact that the 
error of an estimator performing a local optimization of the SSD must lie b etwee n 0(t]) and 
0^\Jr\l (1 — 77)). Note also that, as it has been seen in the experiments of Figure 5(a) the error of 
this type of estimators may indeed increase with 77 at a rate above 0(77) in practice, as predicted 
by our upper bound. Next, as for the effect of filtering on the estimation accuracy, the authors 
of [13] experimentally observe that tr{J~^) decreases as the image bandwidth increases, which 
suggests that the lower bound on the MSB of a translation estimator is smaller when the image 
has more high-frequency components. This is stated more formally in [23] . It is shown that the 
estimation of the x component of the translation has variance larger than rj'^ /\\{dp/dx)'^\\'^ , and 
similarly for the y component, where dp/dx is the partial derivative of the pattern p with respect 
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to the spatial variable x|j Therefore, as smoothing decreases the norm of the partial derivatives 
of the pattern, it leads to an increase in the variance of the estimation. These observations are 
also consistent with our theoretical results. 

Next, we discuss some results from the recent article |24j . which presents a continuous- 
domain noise analysis of block-matching. The blocks are assumed to be corrupted with additive 
Gaussian noise and the disparity estimate is given by the global minimum of the noisy distance 
function as in our work. Although there are differences in their setting and ours, such as the 
horizontal and non-constant disparity field assumption in j24j . it is interesting to compare their 
results with ours. We consider the disparity of the block to be constant in |24j, such that it fits 
our global translation assumption. In 24], an analysis of the deviation between the estimated 
disparity and the true disparity is given, which is similar to the distance between the global 
minima of / and g in our work. Sticking to the notation of our text, let us denote this deviation 
by to. In Theorem 3.2 of [21], to is estimated as the sum of three error terms, where the 
variances of the first and second terms are respectively of 0{rf') and 0{rf) with respect to the 
noise standard deviation rj. These two terms are stated as dominant noise terms. Then, the third 
term represents the high-order Taylor terms of some approximations made in the derivations, 
which however depends on the value of tg itself. As the overall estimation of tg is formulated 
using the term itself, their main result is interesting especially for small values of t^, since 
the third term is then negligible. It can be concluded from [24] that to ~ + if' + H.O.), 
where H.O. represents high-order terms. This result is consistent with the CRLB of tg in [13] 
stating that tg is at least of 0(77), and our upper bound 0{y/r]/y/l — 77). The analysis in [21] and 
ours can be compared in the following way. First, since our derivation is rather rigorous and 
does not neglect high-order terms, these terms manifest themselves in the rational function form 
of the resulting bound. Meanwhile, they are represented as H.O. and not explicitly examined 
in the estimation 0(77 + 77^ + H.O.) in [24]. For small 77, this result can be approximated as 
to = 0{ri), while our result states that to < 0{^). As > 77 for small 77, this also gives a 
consistent comparison since our estimation is an upper bound and the one in [M] is not. Indeed, 
the experimental results in [51] suggest that their derivation gives a slight underestimation of 
the error. Lastly, our noise analysis treats the image alignment problem in a multiscale setting 
and analyzes the joint variation of the error with the noise level rj and the filter size p, whereas 
the study in }24j only concentrates on the relation between the error and the noise level. 

Finally, we mention some facts from scale-space theory [27], which may be useful for the 
interpretation of our findings regarding the variation of the SIDEN with the filter size p. The 
scale-space representation of a signal is given by convolving it with kernels of variable scale. 
The most popular convolution kernel is the Gaussian kernel, as it has been shown that under 
some "well-behavedness" constraints, it is the unique kernel for generating a scale-space. An 
important result in scale-space theory is [3H] , which states that the number of local extrema of a 
1-D function is a decreasing function of p. This provides a mathematical characterization of the 
well-known smoothing property of the Gaussian kernel. However, it is known that this cannot 
be generalized to higher-dimensional signals; e.g., there are no nontrivial kernels on with the 
property of never introducing new local extrema when the scale increases |27j . One interesting 
result that can possibly be related to our analysis is about the density of local extrema of a 
signal as a function of scale. In order to gain an intuition about the behavior of local extrema, 
the variation of the local extrema is examined in [27j for 1-D continuous white noise and fractal 
noise processes. It has been shown that the expected density of the local minima of these 
signals decreases at rate p~^ . In the estimation of the SIDEN in our work, we have analyzed 
how the first zero crossing of df{tT)/dt along a direction T around the origin varies with the 
scale. Therefore, what we have examined is the distance / between the scale space p{X) of 
an image and the scale space p{X — tT) of its translated version. Since this is different from 
the scale-space of the distance function / itself, it is not possible to compare the result in [27] 
directly to ours. However, we can observe the following. Restricting / to a specific direction 
Ta so that we have a 1-D function f(tTa) of t as in [27^, our estimation for the stationary point 

®This bound is obtained by assuming Gaussian noise on both reference and target patterns and employing CRLB. 
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of JitTa) closest to the origin expands at a rate of 0((1 + p'^Y^'^)^ which is 0{p) for large p. 
One can reasonably expect this distance to be roughly inversely proportional to the density of 
the local extrema of /. This leads to the conclusion that the density of the distance extrema is 
expected to be around 0(p^^), which interestingly matches the density obtained in 27J. 

7 Conclusion 

We have presented a theoretical analysis of image alignment with descent-type local minimizers, 
where we have specifically focused on the effect of low-pass filtering and noise on the regularity 
and accuracy of alignment. First, we have examined the problem of aligning with gradient 
descent a reference and a target pattern that differ by a two-dimensional translation. We have 
derived a lower bound for the range of translations for which the reference pattern can be 
exactly aligned with its translated versions, and investigated how this region varies with the 
filter size when the images are smoothed. Our finding is that the volume of this region increases 
quadratically with the filter size, showing that smoothing the patterns improves the regularity of 
alignment. Then, we have considered a setting with noisy target images and examined Gaussian 
noise and arbitrary noise patterns, which may find use in different imaging applications. We have 
derived a bound for the alignment error and searched the dependence of the error on the noise 
level and the filter size. Our main results state that the alignment error bound is proportional 
to the square root of the noise level at small noise, whereas this order of dependence increases 
at larger noise levels. More interestingly, the alignment error is also significantly affected by 
the filter size. The probabilistic error bound obtained with the Gaussian noise model has 
been seen to increase with the filter size at a sharply increasing rational function rate, whereas 
the deterministic bound obtained for arbitrary noise patterns of deterministic norm increases 
approximately linearly with the filter size. These theoretical findings are also confirmed by 
experiments. To the best of our knowledge, none of the previous works about image registration 
has studied the alignment regularity problem. Meanwhile, our alignment accuracy analysis is 
consistent with previous results, provides a more rigorous treatment, and studies the problem in 
a multiscale setting unlike the previous works. The results of our study show that, in multiscale 
image registration, filtering the images with large filter kernels improves the alignment regularity 
in early phases, while the use of smaller filters improves the accuracy at later phases. From this 
aspect, our estimations of the regularity and accuracy of alignment in terms of the noise and 
filter parameters provide insight for the principles behind hierarchical registration techniques 
and may find use in the design of efficient, low-complexity registration algorithms. 
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A Proofs of the results on SIDEN estimation 
A.l Proof of Theorem [T] 

Proof. From Q and the distance between p{X) and its translated version p{X — tT) is 
given by 

f{tT) = [ {p{X)-p{X-tT)fdX 

„ K K 

= / Y.^,{Kix)-c^^,{x-tT))Y,^k[c^,AX)-c^-,Ax-tT))dx 

K K „ 

^ ^ A, Afe / (0^^ (X) - 0^^. (X - tr)) (0^, (X) - 0^, (X - tT)) dX. 



i=l fe=l 

Therefore, 

dt 



'^f('^) -.j2f:iAt) (56) 



where 



Ijkit) = A, A, 1 (0^^ (X) - 0^^. {X - tT)) {4>,, {X) - {X - tT)) dX. 



= A, Afe ( -| K (X)^-,. - tT)dX -jj^^ K - iT)<l>^. {X)dX^ 
since the other terms do not depend on t. From Proposition [2] we have 



(57) 



d 
dt 



c^^^{X)4>,,{X-tT)dX 

R2 



Similarly, one can obtain 



Therefore, we can rewrite (57 1 as 



= \j\k Qjk Sjkit) 

where 

= e-("^-*' + 2''^'=*) (a.fci + 6,fc) + e-("^'=*'-2''.'=*) (a^-, f - 5,0 ■ (58) 
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Denoting 

J- = {{j,k):X,Xk<0} 
J+ = {{j,k):XjXk>0} 

for a given direction T, we would like to compute a St which guarantees that 

^ = ttlAt)>0 (59) 

j=i k=i 

^ E E ^ifcW>o (60) 

■^=4> E ^3^kQ]k Sjk{t) + E l^'^i^fel Qjfc (-Sifc(*)) > (61) 
(j\fe)e./+ 0-,fc)G,/- 

for all < t < Using the Taylor expansion of Sjk{t), one can show that 

sjkit) = Pjk{t) + Rjkit) 

where 

is a 3'''^-order polynomial term, and the magnitude of the remainder term Rjk{t) is bounded by 

\RAt)\ < M,k{t) = 1.37 exp ( ^ 1 a%^t\ 

Since Sjk{t) > Pjk{t) — Mjk{t) and —Sjk{t) > —Pjk{t) ~ Mjk{t), the condition in (61) is 
satisfied if 



E >^3^kQjk{Pjk{i)- Mjk{t))+ E \>^]MQ3k{-pA't')~ ^■'i3k{t))>o 

=> E ^^-^^ i (2 - 4 b%)t +f-^ b% + 8 b% a,fe - 2 - 1.37 exp ( ^ J a%' 

(j,k)eJ+ \ ^ \%*:/ 

+ E l^.^fclQjfc (^{2a,,-U%)t- f^h%+8b%a,k-2a%) - 1.37 exp -^^^ 



(i,fc)eJ- \ 
> 

■^=^ ai^ + aat^ + a4t'^ > 0. 

Notice that ai > 0, since it is the coefficient of the first term in the Taylor expansion of df / dt 
around t = 0, and df /dt is positive in an interval (0,e) as f{t) has a minimum at t = 0. The 



condition in ( |62[ ) is equivalent to the condition a^t^ + a^t^ + ai > or \a4\t^ — a^t^ — ai < 0. 
Then, one can observe that the polynomial |Q;4|i'^ — a^t^ — oi\ has a positive coefficient for the 
third-degree term, it has a critical point at t = 0, and its value at t = is negative. Since 
this polynomial cannot have more than two critical points, it has one and only one positive 
root, which is St- This gives the sought bound in the direction of T. Considering all possible 
directions T e S'^ on the unit circle, we obtain 

Q {tr : T e S'\ < t < St}- 

□ 
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A.2 Proof of Theorem [2] 

Proof. Given a direction T, one must determine the order of dependence of St on p. We first 
examine the terms Qjk, ajk, bjk, Cjk- 



Q jk 

a-jk 
bjk 

Cjk 



TT \(Jj(Jk\e 



^jk\ 



2 ^Jk T 



t-J^ (Tk ~ T,) 
(Tk -Tjf {Tk - Tj) 



= -^{^.a^^j' + ^kol^k') 



(63) 

(64) 
(65) 
(66) 
(67) 



Firstly, from ( 67 1 , it is seen that 



Amin(Sjfc) > - (Amin(o'l) + ^minio^k)) 
Amax(^jfe) 1^ 2 (Amax(<5'|) + A,nax(o'fe)) 

where Aniin(.) and Amax(-) denote the minimum and maximum eigenvalues of a matrix. Since 
dj — diag {\ + \ + i ) i the dependencies of the eigenvalues of Uj on p is given by 



Therefore, 
which gives 
and 



A„,i„(CT,) = 0(V1 + P'), A„,ax(<T,) = 0(Vl + p')- 

Amin(Sjfc) = 0(1 + p2), A„,ax(Sjfc) = 0(1 + p') 

A„.i„(S7fe^) = O ((1 + p2)-i) , A„,ax(S7;) = O ((1 + p2)-i) 



(68) 



Then, from (64|-(66) we have 

a,fe = 0((l + p2)-i), 6,fc = 0((l + p2)-i), c,fc = 0((l + p2)-i) 



Next, we look at the term XjXkQjk- From (17) and (63l, we have 



XjXkQjk — AjAfelcTjO-fcl 



^jk\ 



whose variation with p depends on the terms e and y |Sjfc|. The term e in the numerator 



is bounded and asymptotically approaches 1 as p increases, whereas the term y jE^fcl in the 
denominator approaches infinity as p increases. Therefore, the order of XjXkQjk is determined 



by v|S,-fe|, which gives 



XjXkQjk^O{{l + pY')- 



(69) 
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From (21 )-( 22 1 and the relations derived above, di = O ((1 + p^)"^) and 03 = O ((1 + p^)"^). 
Then, the term 

exp(5j\/ajfc - Cjk) = 0(exp ((1 + p^)-!)) 



in ( 23 1 is bounded and approaches 1 as p increases, whereas the remaining term 



I |7nCTjfTfe| 5/2 

in the expression of 04 is of O ((1 + p^)"''/^). Therefore, 04 = O ((1 + p^)"''^^) 
Now, as (54 < 0, (5t is the positive root of the polynomial 

i + + ^• 
a4 Q4 



Its roots are given by 



where 



h = A + qaB + qoC 
t2 = A + qiB + q2C 
ts = A + q2B + qiC 



+ -l-i\/3 
90 ^ 1, gi = 7, ' 92 = ^ 



3q!4 ' 3 V 2 ' 3 V 2 

\ a4 / a4 V / 

From the above, it is seen that i? O ((1 + p^)^/^) and Q = O (l + p^) . Therefore, ^/W^^4Q^ = 
O ((1 + p^f^). This gives 

A = 0((l+p2)l/2), i3 = 0((l+p2)l/2j^ C = 0((l+p2)l/2). 

Thus, the roots of ti, t2 and ia of the polynomial are also of O ((1 + p^)^/^), which shows that 

^^ = 0((l+p2)l/2) 

for every T. As Q is two-dimensional, we finally obtain the variation of its volume with p as 

V{Q) = 0{l + p'). 

□ 

A. 3 Proof of Proposition |4] 

Proof. In order to show the desired property, we will first show that the condition a^^ > 2 6^^ 
for a given direction T guarantees that ^^^^^^ has no zero-crossings along that direction; i.e., 
"^^-^^p- > for all t > 0. We will then verify that the condition aj^ > '2b'^i^ for the smoothed 
pattern is necessarily met along all directions T if the filter size is big enough. 



46 



Assume that ajk > 2 6^^,. Since all A^'s have the same sign, AjA^ > for all (j, k) pairs, and 
from ( 24 1 the condition 

Sjk{t) > 0, Vt > (70) 

is sufficient to guarantee that ^^^^ > 0, Vi > 0. We will now show that Ujk > 2 implies the 
condition in ( 70 ) . We have 



(71) 

e'"'^^\ajkt-hju)^{ajkt + h.jk) > (72) 



a,fet(e4''^'=* + l)>6,fc(e4''^--*-f). 



(73) 



Notice that if bjk = 0, the condition (70 1 is already satisfied. We now consider the two other 
cases; bjk < and bjk > 0. 



Let bjk < 0. Then the condition in ( 73 ) is equivalent to the condition 

bjk 



(74) 



By the hypothesis aj^ > 2 6|j,, we have < 2bjt.t. Therefore, it is sufficient to show 



that 



2 bjk t < 



in order for (74 1 to hold. Putting u = bj^ t in the above expression, one can easily verify 
the inequality 



2u < 



e*" - 1 

3 4m 



e™ + l 
by plotting for u < 0. 
• Let bjk > 0. Then the condition in ( 73 ) is equivalent to the condition 



f 



(75) 



Similarly to the previous case, the hypothesis Ujk > 2 bjf. implies > 2 bjk t. Therefore, 
it is sufficient to show that 



2 bjk t > 



which can be verified by letting u — bjkt and checking that 



2u > 



- 1 



for M > 0. 

We have thus proven that the condition ajk > 2 6^^, guarantees that ^^^^^ has no zero- 



crossings for positive t. Now we examine the condition 

ajk > 2 ^k 



dt 



for smoothed versions of the pattern. The LHS term ajk is O ((1 + p^) ^) and the RHS term 
2 6|j, is O ((1 + p^)^^). Therefore, for each T, there exists < px < oo such that the condition 



fe > 2 is met for all p > pr- Letting 



Po 



max px 



concludes the proof of the proposition. 



□ 
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B Proofs of alignment accuracy results for Gaussian noise 
model 



B.l Derivation of 

The expectation of h{tT) is given by 



E[h{tT)] 



-2E 



E 



IR2 



piX)w{X - tT)dX 
w^{X~tT)dX 



+ 2E 


\f 







/ p(X - tT)w{X - tT)dX 



The separate terms above are calculated as follows. First, 

K L 



E 



IR2 



p{X)w{X - tT)dX 



= E 


/ 




JR2 



k=i 1=1 

„ K L 

/ -^7. m] - ^ndx - o 

■'^^ fc=i 1=1 



(76) 



since the noise atom coefficients Q and the atom parameters ^/ are independent, and E[Q\ = 0. 
Similarly, 



E 



I p{X - tT)w{X - tT)dX 



= 0. 



(77) 



Then, 



E 



/ w'^{X~tT)dX ^ E 

J [Jr 



w^{X)dX 



IR2 

L L 



E 



= ^ EEoc™ / 0c,(x)0^„(x)dx 

J=l m=l 
L L r 
= EE^[C/Cm]£^ / 0e,(X)<^5„(X)dX 



;=i m=l 
1=1 



R2 



since E [OCm] = when / ^ m. Now, from Proposition [2] 



<f>l{X)dX 



n\E\- 



: exp(O) 



tt\E\ tt 



(78) 



where the matrix D;; is given by D;/ = + E ) — E as the rotation matrices of noise atoms 
are identity. Finally, since E — tj^, 



E 



[ w^{X-tT)dX 



2,2 



(79) 



Therefore, we obtain 



E[h{tT)]=^in = i^Lift\ 
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B.2 Derivation of Eq. ( [3T| ) 

Since the global minimum of the noiseless distance function 

/(tT)= / {p{X)-p{X~tT))'dX 

is at 0, 



dfitT) 



dt 







for all unit directions T. Similarly, as the global minimum of g is at io^o, 

dg{tTo) 



dt 



0. 



(80) 



(81) 



Let us now regard g{tTo) as a function of t. Using the Taylor Remainder Theorem, gitTo) can 
be expanded around to as 



7(tTo) = gitoTo) + {t ~ to) 



dg{tn) 



dt 



{t - tof d^gitTo 



t = to 



dt^ 



t=tl 



for some ti £ [t, ip], assuming that t < tQ. Due to (81 1, the expression of gitTo) is reduced to 

{t - tof d^gitTo) 



9{tTo) = g{toTo) 
Evaluating g{tTo) at t = 0, we obtain 

5(0) - g{toTo) + 



df^ 



tl d^g{tTo 



dt^ 



for some ti G [0,to]. As g{tT) = f{tT) + h{tT), ([82]) can be rewritten as 

, tl d^h{tTo) 



/(O) + h{0) = /(toTo) + h{toTo) + 



tl d^fitTo) 



dt^ 



dt^ 



(82) 



(83) 



t=ti 



Then, considering f{tTo) as a function of t and expanding it around in a similar way, we get 



f{tTo) = f{0)+t 



dfitTo) 



dt 



d^fitTo) 



2 dt^ 



t=t2 



■'^ ' 2 dt^ 



for some t2 G [0,t], where t > and the second equality follows from (80). Evaluating the above 
expression of f{tTo) at t — to, we get 



fitoTo) = /(O) + 



tl d^fitTo) 



2 dt^ 



(84) 



for some t2 G [0,to]- Since the global minimum of / is at and the global minimum of g is at 
toTo, we have 

g{0) - gitoTo) > 0, /(O) - /(toTo) < 

which gives 

h{0) - hitoTo) - (<?(0) - gitoTo)) - (/(O) - f{toTo)) > 0. (85) 

Therefore, 

h{0) - h{toTo) = |/i(G) - h{toTo)\. (86) 
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From ( 83 1 and ( 84 ) we have 



/(ioTo) - /(O) = h{0) - h(toTo) - 



2 dt2 



4 (PfitTo) 



tj d^h{tTo) 
2 



Combining this with ( 86 ) yields the following equation for tp 

, d^fitTo) 



2 I dt^ 



dt^ 



d^h{tTo) 



dt^ 



\h{o) - h{ton)\. 



B.3 Proof of Lemma [T] 

Proof. The mean of the noise pattern is 

L 



E[w{X)] = E 



.1=1 



since the coefhcients Q are zero- mean and independent of This gives 



E[Ah{tT)] ^2E 



/ {p{X) - p{X - tT))w{X - tT)dX 



= 0. 



Therefore, the variance of Ah{tT) is given by 



where 



'^Ah{tT) 



= AE 



AE 



(^j {p{X) - p{X - tT))w{X ~ tT)dX^ 



IR2 



{p{X + tT) - p{X))w{X)dX / {p{Y + tT) ~ p{Y))vj{Y)dY 



IR2 



AE 



f {p{X + tT) - p{X)) Ci iX) dX 
/ {p{Y + tT) - p(Y)) J2 Cm 0u dY 

L L r „ 

4EE^[^'Cm]i^ / {p{X + tT)~p{X))^^,{X)dX 

1=1 m=l 

/ {p{Y + tT)^p[Y))<P^^{Y)dY 

Aj2E[Cf]E[{B,~B,)'] 
1=1 

L 



1=1 



Bi= [ p{X + tT)4>^,{X)dX, B2^ I p{X)(t>^,{X)dX. 
Jr2 Jir2 



(87) 

(88) 
(89) 

(90) 



(91) 

(92) 
(93) 

(94) 
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Since the noise atom parameters are i.i.d., from (94) it can be seen that the values of E[Bf], 
E[BiB2], -E[_B|] in (93) do not depend on the atom index I. Hence, one can rewrite (93) as 



' Ah(tT) 



4Ltj^ {E[Bf] - 2E[BiB2] + E[BI]) 



(95) 



In the following, we derive upper bounds for the terms E[Bi], -E[-B|] and a lower bound 
for the term E[BiB2] such that from the expression in ( |95[ ) we can obtain an upper bound for 
'^Ah(tT)- First, observe that using Proposition j2j we obtain 

. K K . 

Bi ^ VAfc0T,jX + tT)05,(X)dX = VAfc / ^^,{X + tT)(j)^,{X)dX (96) 
■'^^ fc=i k=i ■'^^ 



exp ( {5i - Tfe + tTf H-/ {Si -Tk+ tT) 



where 



This gives 



(97) 



where $fc is as defined in (33). In order to simplify the notation, let us write 



Then, (97) gives 



K 



Bi=Y^ Afc Kfe exp {-{5 - Tfc + tTf {5 - + tT)) 



k=l 



where Kfc is as defined in ( 34 ) . Similarly, one can obtain 

K 



-B2 = ^ Afc Kk exp {-{5 - Tfe)^ $fc {5 - Tk)) 



fe=i 



We start with the computation of E[Bl]. Let 

Bjk := E [exp {-{6 - t, + tTf $j (^ - t, + IT) - (<5 - + tTf $fe ((5 - + <T))] . (99) 
Then, 



K K 

^[^1] = E E ^J'^J 
j = l fc=l 



(100) 



The term is not easy to evaluate analytically. However, it is easier to derive an upper bound 
Sjfe and a lower bound B^^^. for Sj^, which can then be used in the computation of an upper 
bound for E\B'^. In order to derive Bju and S-^,, first let 



ttfc = Ainin($fe), = A,„ax(*fc) 



(101) 



denote respectively the smaller and greater eigenvalues of Since $fc is a positive definite 
matrix, /3fc > a?; > 0. Observing that 



for all (5, we have 
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where 



- E[exp{-/3j\\5-Tj+tT\\^ ~ l3k\\6-Tk+tT\\^)] 
B,k = E[exp{-aj\\6-Tj+tT\\^ -ak\\6-Tk+tT\\^)]. 

Denoting = [Tx,k Ty^kV , it follows that 



where 



1 

2b 
1 

26 



exp {~{fij + 13k) (Sx - T^j + m)^) dSx 
exp {~{l3j + ^fc) - + tTj^)^) d^y. 



Evaluating these integrals, the above terms are calculated as 
v/t? 1 



exp + {G%{t,t) - (ir%{t,t)f)) 



erf (y^-T^ (6 + t))) - erf (^^7+^ (-6 + H%{t, t))) 



IS 



h: 1 



exp (-(/?,+ AO (Gj,(i,t) - {K%{t,t)f)) 



• erf (y^7T^(6 + ^J,(t,t))) - erf (y^7T^(-& + ^f,(t,i))) 
where the functions i/^fc, GJ^, Hj'fe, Gj'^, : ^ R are defined as 

G%{u,v) 
H%{u,v) 
G%{u,v) 



13 j {u 


- Tx,j) + Pk {vTx - 


- Tx^k) 




fi-j + Pk 




I3j [u 


- TxjY + 13k {vTx 


- Tx,kY 




+ fik 




13 j {uTy 


- Ty^j) + f3k (vTy - 






13 J + f3k 




13 J [uTy 


- Tyjf +Pk {vTy 


-ry,k? 


Pj + Pk 



The lower bound B^j^ is thus obtained by substituting (103) and (104|) in (|102). 
Replacing /3j, j3k by a^, ak in the above equations, one can similar, 
bound Bjk as 



(102) 



(103) 



(104) 



(105) 



y compute the upper 
(106) 



where 



B,k 



1 



exp 



-{a,+ak)[G;k{t,t)-{H%{t,t)Y)) 



46 y^aj + ak 

erf (y^7T^(6 + iTjfc(t,t))) - erf (^^TT^ (-6 + t)) 



1 



jk 



exp -(a, + a,) G;,,(t,t) - (H'^kit^t))' 



45 y/aj + ak 

erf (^^aj + ak (b + H%it, i))) - erf (7^7+^ (-6 + H%it, t))^ 



(107) 



(108) 
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and ijjfc, GJj,, H^j^., G^j. : — > R are given by 









- ak {v - 


Tx,k) 








Ctj + Q!fe 












+ ak {v - 




? 






aj 


+ a.k 








[uTy 




ak [v Ty - 


'^y.k) 










"fe 








{uTy 


^ '''yj) 


-f ak {v Ty - 


- Ty^k 


? 















Having thus found an upper bound Bjk and lower bound B^f. for Bjk, we can now compute an 
upper bound for E[Bl]. Since Kj, Kk, Bjk ^jk are positive, from (100 1 we have 

K K 

j=l k=l 

= X! \]i^j\kHkBjk+ ^ XjKjXkKkBjk (109) 
U,k)eJ+ U,k)eJ- 

^ X! XjKj XkKkBjk + ^ \jKj\kHkBjk 
{],k)£j+ (j,k)e,j- 



where J+ and are as defined in ( 35 ) . 

This concludes the calculation of an upper bound for E[Bf]. In the following, we apply a 
similar procedure in order to obtain a lower bound for E[BiB2] and an upper bound for i?[_B|]. 
Let 

C,k := E [exp {-{6 - t, + tT f $j [5 - r, + <T) -{5- Tkf $fe (5 - r^))] . 

Then 

E[B^B2] =Y.Y. ^J^J ^fe^fc ^Jfe- (110) 
The term Cjk can be lower and upper bounded as 



where 



Cjk < Cjk < Cjk 



c,k = c^A (111) 



^jk — '^jk'^jk 

-jk '^jk 



jk - CjkC]k (112) 
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and 



_ V " 



exp 



erf (yc^7+^(6 + H;jt,0)) 



46 \/aj + afe 



exp 



erf (V^7T^(-6 + ^,-,(i,0))) 
G%it,0)-{H%it,0)r)) 

- erf ( i-b + H%{t, 0))) 
G;,(t,0)-(F;,(t,0))2)) 

- erf ( (-6 + 0)) 
G^,(t,0)-(F,^,(t,0))2)) 

- erf ( Va,- + "fe + 0))) 



• erf (^^a, +afc (6 + //^^(t, 0)) 
A lower bound for E[BiB2] is thus given by 

E[BiB2]> ^ XjKj XkKkCj,^+ ^ XjKjXkKkCjk- 

{3,k)eJ+ ij,k)eJ- 

Finally, defining 

Vjk ■■= E [exp {-{6 - Tjf $j {S - Tj) -{S- Tkf {S - r^))] , 

we have 

K K 

E[BI] = X] X] '^i'^i ^'^'^'^ -^-J*' 
j=i fe=i 



Then, bounding 2?^^ as 
where 



V^k < T^jk < -Djk 



and 



exp + /3k) {G%{0, 0) - (^^,(0, 0))^)) 



4& 

erf (v/:5^+^ (6 + ^^^(0, 0))) - erf [^/p^+Tk {-b + H%{0, 0)))" 



46 ./Pi+Tk 



exp 



(-(/3,+/3.)(g^\(0,0)-(^JJO,0))^ 



erf (y:^^ (6 + H_y,{0, 0))) - erf (y^^S^ (-6 + ^J,(0, 0)))^ 



jk 



1 

46 ^Q!j + ttfe 



exp 



erf (^^7+^ (6 + H%{0, 0))) - erf (^^7+^ (-6 + H;,{0, 0))) 



erf 



(6 + H%{0, 0))) - erf (^/S7+^ (-6 + if^^,(0, 0)) 
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one can upper bound i?[i?2] a-s follows 



(124) 



an upper bound for cr^ 



We can finally combine the bounds (1091, (117) and (124) with the equality in (95), which gives 

^Ah(tT) 



^Ah{tT) 



< 4 L ryM ^ X,k, Xk^k (Bjk ~ ^C^k + T^jk) 
^ (j:fc)eJ+ 

+ XI ^I'^i ^'^'^^^ ^^Jk - 2Cjfc + Vjk) 

U,k)€J- 



Deflning 



€jk := Bjk - 2Cjk + Vjk 
we can rewrite the above inequality as 



(125) 



U.k)eJ+ 



XjKj XkKk <Cjk + 



{3,k)<£J 



XjKj XkKk djfc 



(126) 



which gives the upper bound i?_2 for cr? ...rps as stated in the lemma. 

A)i(tT) i\n(ti ) 

Finally, Bjk, Bjk, Cjk, Cjk, T^jk, V.jk ^'^^ bounded functions of t and T as the erf(.) function 
is bounded and the terms such as G^jk{u, v) — {H^jkiu, v)Y and CP'jkiu, v) — (ii^j,(u, v)Y that are 
inside the exponentials with a negative coefficient are always in the form 



ttjlj] + akbl ^ / ajbj+akhk 
aj + flfc \ aj + flfc 



CLjO-kibj 



{aj + akY 



> 



therefore, nonnegative. For this reason, the terms €jk and djfc are bounded functions of t and 
T. 

□ 



B.4 Proof of Corollary [T] 



Proof. We observe from (36) that in order to derive a uniform upper bound for Ra\^^ over 



Bj (0), we need to find a uniform upper bound for <Cjk and a uniform lower bound for djfc. 
The definition of Cjk and djfc in (1251 shows that this requires the derivation of uniform upper 



bounds for Bjk, Cjk, and uniform lower bounds for B_jf^, C^jk- the following, wc derive these. 
Observe that 'Djk and T^jk do not have a dependence on t and T; therefore, they can be used 
directly. 



We begin with Bjk- We first examine the term Bjk in (107 1. Since 



G;.(M)-(i?;(M))^ = "^"^(^^'="^^^)' 



the term 

exp (-(a, + ak) {G]k{t, t) - {Wjt,{t, t)) 



{aj + UkY 



2\ \ / ^j^k {'^x,k ^a:,j) 



exp 



{aj + ak) 



(127) 
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is independent oft and T. Next, as b + Hjf.(t,t) > —b + Hji^{t,t) and erf(.) is a monoton- 
ically increasing function with the asymptotic values —1 and 1 at — oo and c», 



erf 



( V«j +akib + t))) - erf ( ^a, + a,. (-6 + H^^^it, t))) < 2. (128) 



From (107), (127) and (128), we get 



Similarly, for S^j. we obtain 



^'^ 26 i/aj + a/c \ (aj + Uk) 

jk "'" 



7V ^ 1 f ^j^k {Ty,k - Ty^jY 



^ik < , exp 1 — 

2b y/aj + at \ [aj + ak) 

ct of the a 
Bjk for Bjk 



The product of the above bounds for S^j, and B% gives the following uniform upper bound 



2 



Then, in order to uniformly lower bound Bjk, we first examine the term B^k (1031. 
Observe that 

exp + {G^.it, t) (H^.it, t)r)) = exp (130) 

is independent of t and T as before. Next, we need to derive a lower bound for the term 
erf [yMTJ~k{b + H^kit, t))) - erf (^^TTA {-b + H:%{t, t))^ 



which is the difference between the evaluations of the erf (.) function at two different values 
V'/3j + Pk{h + K%{t,t)) and + Pk {-b + Hj,,{t, t)). From the Mean Value Theorem, 
there exists ly 



^MTp~k i-b + H'^kit, t))<,y< .M^k (b + H% [t, t)) 

such that 

erf (y^-T^ (6 + H-^S, t))) - erf (v/^^T^ (-6 + H:%{t, t))) 
= y%TTk2b^eri{u)l^^ - v/^7T^26-^e^'''. 

We thus need to find a lower bound for e^"^ . Since tT e Bj^{0), —to < tT^ < to- This 
yields 



Pj + Pk 



b + H%{t,t) <b + to- 

(131) 

-b + H^kit, t)>-b~to~ 

Pj + Pk 

As the minimum of the function e~'^ over a closed interval may occur only at the end 
points of the interval, we have 

erf (y^7T^(6 + ^;^,(i,t))) - erf (v/^7T^(-6 + ^^\(t,i))) > 

(132) 
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where 



&ffc — (/3j + /3fe) max <^ [b + to- 



-b-to- 



ft + h 



is obtained using (131). 

Now using the bounds (132) and (130) in (103), we get 
W,k > exp f-b - ~ 



Similarly, one can show that 



where 



(133) 



(134) 



Pj'^y-j + PkTy.k 



, J PjTy^j + ftTj,,fc 



Finally, using (1331 and (134) in (102), we obtain the uniform lower bound S on S.j. as 

— 3 k 3 

follows 



^3k ^ — exp 



^Jk ^jk 



ft-ft||rfc-Tj-|| 
(ft+ft) 



(135) 



• Next, we compute a uniform upper bound for Cjk. We begin with the term Cjf. in (115). 
The first part of the expression is upper bounded as 



exp 



exp 



-(a,+afc)(G;,(i,0)-(ff;,(i,0))2)) 

Putting this together with 

erf (& + i7;,(t, 0))) - erf (y^TT^ {-b + E]^(t, 0))) 



< 1. 



< 2 



we get 



Similarly, one can obtain 



C < V 
^ ' 26 ^oT+Ofc 



26 ^ttj + Qffe 
which gives the uniform upper bound 

^=462 («,+a,)- 

• Lastly, we derive a uniform lower bound for C^j,. Since —to < tT^; < toi 

< (tTx + T2,^fe - Tx.jf < max {(-to + r^^k - Tx,jT, (to + Tx^k - Tx,jT] 



(136) 



Therefore, the first term in the expression of C^jf. in (113) can be lower bounded as 

exp (-(ft + ft) (GJ,(t, 0) - (^|,(t, Q)f)) > e-^- (137) 



57 



where 



{( — ^0 +Tx,k — Tkj)^, (^0 +Tx^k — Tx,j)^} ■ 



Then, we derive a lower bound for the term 

erf (y^-T^ (fc + H^kit, 0))) - erf (^^7+^ {-b + H^.it, 0))) 
The condition —to < tT^ < to imphes that 



b + mk{t,Q) <b + to 



A- + Pk A- + Pk 



Pj + Pk Pj + Pk 

Then applying the Mean Value Theorem as in the derivation oi B , we get 

erf (y^7TA:(6 + ^^^fe(t,0))) - erf (y^7T^(-6 + ^^-,(i,0))) > i|y^-T^e-S^. 

(138) 

where 



c,, iP, + max ^ ( 5 + to— ^— — 



-b-to 



/3. 



PjTx,] + PkTx.k 



Using the results (137) and (1381 in (113) yields 
One can similarly show that 



c%>^M-~^)k-^k) 



Pj + Pk Pj + Pk 

(139) 
(140) 



where 



and 



^■'^ ■ /3, + Pk 



-{{-to + Ty^k - Tyjf, {to + Ty^k " TyJ^] 



■jfe := (^j +/3fc)max<^ U + to 



Pj Pj'^y.J + PkTyM 



P,+Pk P,+Pk 



+ PkTy^k 



Pk Pj + Pk 

Finally, using (139 1 and (1401 in (111), we obtain the uniform lower bound C_^^ on C^j, as 

(141) 



follows 



^ik > = cxp ( -c^fc - c% - - 5^ 



'-jk '-jk '^jk '^jk 



We have now finished the derivation of the upper bounds Bjk, Cjk for Bjk, Cjk', and the 
lower bounds B^^, C_.^ for Bji^, Cji^. Defining 



we have 



Cjk Bjk - 2£^.^, + Pjfe 



Cjfc > €jk 
^]k < <^jk- 
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Using the above inequalities in the statement of Lemma [T] wc obtain the sought uniform upper 



bound R„2 on R„ 

"Ah "A(i(tr) 



2 



"Ah ' 



□ 



B.5 Proof of Lemma [2] 



Proof. In Section 3.2 we have shown that the first derivative of the noiseless distance function 
has the form 

df{tT) 



K K 



K K 



dt 



j = l k=l 



j=l k=l 



where the expressions for the terms Qjk and Sjk{t) are given in Q and (25). The second 
derivative of / is therefore given by 



3 = 1 fe=l 



dt 



(142) 



where 



dt 

and the terms ajk and bjk are as defined in ([s]). 

Now, in the derivation of a lower bound for ^[2"^^ j we use the Taylor expansion of 
around t — 0. The first three derivatives of Sjk{t) at are given by 

dsjkit) 



dsjk(t) 
dt 



dt 



dt^ 

<P Sjk{t) 







dt^ 



and the expression for the fourth derivative is 



It is easy to show that for all t, the magnitude of ^^4*'*^ can be bounded as 



dt4 



< 32.72 e-^'o" aJ^. 



(143) 



From Taylor's Remainder Theorem 

dsjk{t) 



dt 



(2a,, - Ab],) + (-8 + 24 b%a,, - 6 a%) ^ + ( 



dt^ 



6 
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where u £ [0,t]. Using the result (1431 m this expression, we get 



^ > -(2a,fc - ib%) - (-8 b% + 24 b%a,k - 6 a%) f - 5.46e^ a^^f 



Using these inequahties in ( 142 ) , we obtain 

> E ^J^fc ( - 4fc ■ J + (-8 b% + 24 6^\,a,, - 6 a%) ^ - 5.46e^ a^^f J 
(j,fc)ej+ \ / 

+ E l^^-^^-l Qjfc -(2a,', - 462,) - (-8 b% + 24 b%a,^ - 6 a^J ^2 _ 5.456 4^ a^5/2 ^ 
(j,fc)eJ- V 

K K K K 

= ^^^^ Qok{2ajk - 46,\) + E E A, AfeQjfc(-8 b% + 24 62;,a,, - 6 a]^)^ 

j = l k=l j = l k=l 

K K 



This yields 



where 



> n) + rat' + rgi^ (144) 



K if 

ro = EE^J-^'«^Jfe(2a,fe-462,) (145) 

j=i A;=i 
K K 

^2 = EE^^^^Q^fe(-8^'lfe + 24fe,2,a,fe-6a,\) (146) 
j=i k=i 

K K 

rs = -EEl^J-^fclQ^'=5.46e^ a^'f. (147) 

i = l /c=l 

In the following, we compute uniform lower bounds Tq, ^21 respectively for the terms rg, ra, 
ra so that ^j*'^'* can be uniformly lower bounded as 

(ffjtT) 2 3 

for all T e and i e [0,to]. 

• We begin with rg. Equation ([s]) implies that 

K K 

ro = EE^J^''-^J'=(2ajfe-46|fe) 

j=i k=i 

K K 

-EE ^^-^^^ Q^'^ (^^ ^7^' T-T^ (Tfe - r,)(r, - r,f T) 

3 = 1 = l 

= T^RoT 
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where 

K K 

^o = J2J2 ^^-^^ Q^'^ {^Jk - ^Jki^k T,){n - Tjf^Jk) ■ (148) 
J=l fc=l 

Now, since the distance function f(tT) has a global minimum at t = 0, ^J^^^ > along 
any direction T at t = 0. Therefore, ro > for any T, which shows that i?o is a positive 
definite matrix. In particular, the smaller eigenvalue of i?o provides a uniform lower bound 
for To 

7-0 > = A„,in(i?o) > 0. (149) 

• Next, we examine the term r2. In order to lower bound r2, we need to compute lower and 
upper bounds for each of the terms 6^^., b'ji.ajk and a|j,. We have 



The matrix 

i?f = ^-^{Tk - r,){n - r.f^Jk' (150) 

is a rank-1 positive semi-definite matrix. Therefore, we obtain the following lower and 
upper bounds on bj^. 

b% > 4-4^mi„(i?f)=0 (151) 



16 
1 

16 



b% < = ^AL.(i?f ). (152) 



Since a,fc = 1 T"^ S -..^ T, we have 



Ojfe > a^k — ^A^in(^jfc ) 



Similarly, we obtain 



b%ajk > b^jkOLjk = gAmin(i?2'°)Amin(Sj,t^) = (153) 
^jfeOj/c < b]fM.jk = ^Amax(i?2'')Amax(S7;,^). (154) 



Using these inequalities in ( |146[ ), one can lower bound r2 as follows 



0-,fe)eJ+ 

+ X! •^J^fc<3ifc(245^fcajfc -6a|j^). 
(j\fe)eJ- 



Finally, we derive a lower bound for r^. Using ^ in (147), we have 



^3 = -}_^2^5.46|A,Afc|^i=e ■ 



Now, applying the Cauchy-Schwarz inequality for the inner product defined by the positive 
definite matrix Y,~^ in the definitions of Ojfe, bjk^ Cjk in iSl, one can show that 6^^. — 
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0-jkCjk < 0. Therefore the term e 

jk 



is upper bounded by 1. Observing also that 



< A,„ax(S,fc^)) , we obtain the foUowing lower bound on 



K K 



sr^ \ ^ 5.46 I . , I 



iT\(jjak\ 



j=i fc=i 



25/2 



'Jk\ 



Now, using the derived lower bounds for tq, r2 and r3 in (144 1, we get 



(155) 



1 • • f(tT) 

for alH > and T G S* . In order to obtain a uniform lower bound for — that is independent 
of t and valid for t e [O,to]i one must take the signs of the coefficients into account. Tq > 0, 
Tg < 0, and r2 can take both signs. Defining •= 1^111(^21 0) — Oj the bound in (155) can be 
slightly modified to yield the desired uniform lower bound 



<PfitT) 2 3 



for all T e andte [0,to]. 



B.6 Proof of Lemma I 



Proof. In ( 30 ) , since the terms 

/ p{X - tT)w{X - tT) dX and / w'^{X~tT)dX 

do not have a dependence on t and T, the first derivative of h{tT) is given by 
dh{tT) 



dt 



2^1 p{X)w{X ^ tT) dX = ^2 [ p{X + tT)w{X)dX 
dt J^2 dt J^2 

K L 
^Jt E ^^7. + tT) ^ 06 (X) dX 

■^^^ k=l 1 = 1 

K L 

2^EE^^-<^' / <l>^AX + tT)<l>^^{X)dX 

k=l 1=1 ■^"'^ 
^ K L 

2 ^ 51 E ^fc^' (-('^' + ~ uO^*fc(^i +tT- Tk)) 



k=l 1=1 



where $fc and Kfc are as defined in ( [33^ and (34 1. Denoting 

Qkiit) exp (-(,5; +tT~ Tk)'^^k{Si + tT ~ Tk)) 
and differentiating the above expressions once more with respect to t, we obtain 



d^hjtT) 
dt^ 



K L 



-2EEA.O 



k=l 1 = 1 



( Pqkijt) 
dt^ ■ 



Since £^[0] = 0, 



E 



'(fhirry 
dt^ 



= 0. 



□ 



(156) 
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Therefore, the variance of 



cPh(tT) 



is given by 



'^h"(tT) — ^ 



cfh{tT) \' 
dt^ ) 



= E 



K L 



iKk- 



k=l 1=1 



K L 
j = l m— 1 



d'^qjmjt) 
dt^ 



K K L 

fc=i j=i 



l=X 



d'^qkiit) d^qji{t) 
dt^ dt^ 



(157) 



where the last equahty follows from the fact that E[C,iC,m] ^ only if Z = to. 
noise atom index I and replace 5i by 5 for simplicity of notation. Defining 

quit) := exp {-{5 + tT - Tk)'^^k{5 + tT - Tk)) , 



We remove the 



the expression in (157) can be rewritten as 

K K 



j=i k=i 



d^qjjt) d^qkit) 
dt^ dt^ 



(158) 



Now, we examine the term E 

dt^ 



dt^ df^ 



It is easy to verify that 
q,(t){AvUt)-2T^^,T 



where 

Hence 
E 



v,{t)^T^^,{5 + tT~T,). 



d^qjjt) d^qkjt) 
dt^ dt^ 



: 16 E [q,{t)qk{t)v^{t)vl{t)\ - 8 T^$,T E [q,{t)qk{t)v^^{t)\ 
8 T^$,T E [q,{t)qk{t)vl{t)\ + 4 T^$,r T^^^T E [qj{t)qk{t)] . 



(159) 



The expression in ( 158 1 shows that an upper bound for g^,,^^^ -^ can be derived by computing 
upper and lower bounds for each of the additive terms in ( 159 1. 



First, observe that E [qk{t)qj(t)] is equal to the term Bjk defined in (99). We have already 
derived a lower bound B^^^ and and upper bound Bjk for this term in the proof of Lemma [ij 
Next, one can notice that the terms w|(t) and v\{t) are of the form 

v]{t) = {d + tT- Tjf ($Jrr^$j) {6 + tT- Tj) 



where ^jTT^^j is a rank-1, positive semi-definite matrix such that 

Ami„($Jrr^$j) = 
A„.ax($rTr^i>,) < aLJ*,). 



(160) 
(161) 



Therefore, there exist S vectors for which v'j{t) and vl{t) attain their lowest value 0. For this 
reason, it is not easy to analytically derive positive lower bounds for E [qj{t)qk{t)vj(t)v1{t)\, 
E [qj{t)qk{t}vl{t)] and E [qj{t)qk{t)v-j{t)^, which include the terms w|(i) and vl{t), unless one 
evaluates these expectations exactly. For this reason, for the sake of simplicity, we use the trivial 
lower bound for these expectations in the rest of our analysis. 

To complete the derivation, we compute upper bounds for each of the terms E [qj {t)qk{t)vj {t)v1{t)\ , 
E [qj{t)qk{t)vl{t)] and E [q.j{t)qkit)v'^{t)] in the following. 
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We begin with the computation of an upper bound for E [qj {t)qk{t)v'^ {t)v'l(t)^ . Applying 
the Cauchy-Schwarz inequaUty, we get 



E [q,it)q,{t)v]it)vlit)] < ^E[q]it)v^it)]^E[qlit)vt{t)]. (162) 

Let us denote 

E[q^it)vlit)]. 

Now we compute an upper bound £j for £j. First, qj{t) can be upper bounded as 

qjit) < exp(-aj||(5 + <T-Tj||2) 



where aj denotes the smaUer eigenvalue of $j as defined in (101). From (161), we also 
have 

v^{t) < (3]\\5 + tT~r,r 

where /3j denotes the larger eigenvalue of $j. Using these inequalities, we can upper bound 
as follows 



1 

2b 
1 

2b 



2 pb rb 



bJ-b 
2 „b „b 



bJ-b 



q^it)vlit)dS^ dSy 

cxp {-2aj[{6, + tT, - r,,,)2 + {Sy + tTy - Ty,^)^]) 



(3f [(4 + m - T^jf + {Sy + tTy - Tyjf]'' d6, dSy. 



The evaluation of the above integrals yields 



where 

= - exp {~2aj{b + tT^ - T^,jf) 
+ exp i-2aj{-b + tT^ - t^,^)^) 



(b + rr^-T^jf 3ib + tT,-T,/, 



4a o 



16 a2 



4 a. 



16 a| 



+ 



211/2 5/2 



(^eTf{y^j{b + m - T^,j)) - erf (y2a7(-6 + m - t^,j))^ 



(163) 



(164) 



L| = - exp (-2a, (5 + tTy - Ty^f) ( ^-^ + ^^'^^^ ' + "^^^ 



4a,- 



16 a| 



+ exp (— 2aj(— 6 + tTy — Tyj)'^ 
, Sv/tt 



{-b + tTy-Ty,,f , 3{-b + tTy-Ty^,) 



4 a,- 



16 a2 



211/2 5/2 



23/2 



(^erf (^2^(6 + tTy - Ty,,)) - erf (y2^(-6 + tTy - r^,,))^ 

J 

^ (^crf (72^(6 + - T,j)) - erf (v/2^(-6 + <T, - t,j))^ 



^ 23/ 



(erf (v^(6 + - r,,,)) - erf (y2a7(-& + tTy - r,,,)) j 



(165) 



(166) 
(167) 
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TV/ = - exp {-2aj{b + tT^ - r^^ 
+ exp (-2aj(-6 + tT^ - t^,j) 



2N {b + tT^ -Tx,]) 



+ exp (-2aj(-6 + tTy - Ty,jf) 

eri(^J2aj{b + tTy — Ty 







i-b 








- Tx 


,)) - erf (72 


{b + 








(-6- 


f tTy — Tyj) 



(168) 



4a, 



(169) 



27/2 3/2 



,))-eri{^,{-b + tTy-Ty,,))y 



Replacing the index j with the index k in the corresponding terms, we obtain the upper 
bound 



£,^^ {LlMl + 2Nl^Nl + LIMP 



for ft, such that 



£k = E[ql{t)vl{t)] < £, 



(170) 



Using the bounds £j < £j and £k < f fc in (162), we obtain the following upper bound for 
the term E [qjit)qk{t)v]{t)vl{t)] 



E[q,{t)qk{t)v^{t)vl{t)] < ^/£~£l. 



(171) 



• Now, we continue with the computation of an upper bound for E [qj{t)qk{t)v1{t)Y Simi- 
larly to the previous case, Cauchy-Schwarz inequality yields 



E[q,it)q,it)vlit)] < ^E[q]it)]^E[qlit)vtit)] 



(172) 



An upper bound £k for £k = E[ql{t)vl{t)] has already been given in (170 1. We denote 



which can be bounded as 



2b 



2 „6 „h 



q'^{t) dSx dSy 



bJ-b 



^1 < ^0 



where 



2 „b „fc 



I-b J-b 

The evaluation of the above integral yields 



exp (-2aj[((5a; + tT,, - t^.-jY + {5y + tTy - Ty^.jY]) dS^ dSy 



(173) 



where M/ and iWj are as defined in (166) and (167). The sought upper bound for 
E \qi{t)qk{t)v'k{t)\ is thus given by 



E[q,{t)qk{t)vl{t)\ < ^J~£u- 



(174) 
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Finally, the term can E ^qj{t)qk{t)v'j{t)^ be upper bounded using the symmetry with the 
previous case as 

E[q,it)qk{t)v^{t)]<,JfjTl. (175) 



Now, putting together the results (171), (174) and (1751 and using the previous bounds for 



Bjk, we obtain the following upper and lower bounds for the term in (1591 



E 



(Pqj{t) (Pqkjt) 



^d^qjjt) d^qkjt) 
dt^ dt^ 



E 



(176) 



> -8 ^kTdSjFk ~ST' ^jTdFjEk + 4 $,r ^uTB^^,. (177) 



following upper bound on for cr^„^j^-| 



Lastly, using the inequalities (176) and (177) in the expression in (158), we obtain the 



cr 



h"(tT) 



i3.k)eJ+ 



< ArfL 



Let 



ejk 16 J £j£k+^T'<^>,TT'^kTB 



jk 



s 



jk 



-8 •^kTxlEjFk 



(178) 

(179) 
(180) 



Then, the upper bound on (jf^n^^rp-^ can be rewritten as 

^ U,k)eJ+ 



XjXkKjKk^jk 

{j,k)e.J- 



To conclude the proof, it has already been explained in Appendix 



B.3 



that Bjk ^jk a-re 



bounded functions of t and T. Similarly, in the expressions ( 164 |-( T69[ ), since the polynomial 



terms are dominated by the decreasing exponential terms, and the erf(.) function is bounded by 
definition, Ek^ ^k are also bounded functions of t and T. Therefore, ejk and 'Sjk are bounded 
functions of t and T. 

□ 



B.7 Proof of Corollary [2] 



Proof. In order to obtain a uniform upper bound on cr^z/^jj^), we need to compute uniform 



B.4 



Below, we derive the bounds for 



upper bounds for y£j£fe, 8 ^kT^EjJ'k, 4 ^^T ^kTBjk and a uniform lower bound 

for 4 T'^^jT T^^kTBj^. Remember that a uniform upper bound Bjk S'lid a uniform lower 
bound B for Bjk have already been found in Appendix 

— jk 

the rest of these terms. 

• First, it is easy to bound T^^jT as follows 



r^$jT > A,„i„($j) 



(181) 
(182) 
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Using (182) and the uniform lower bound B on B,j^ given in (135), one can lower bound 



the term 4 T^^jT T'^^kTB^^ as follows. 



4 r^$jT T^<^kTBjk > Aajttkg 



--3 k' 



(183) 



• We continue with the terms Ej and J-j, whose expressions are given in (163) and (173) as 



4 



452 
— W 



In order to uniformly upper bound £j and J^j, it is sufficient to compute uniform upper 
bounds on the magnitudes of the terms LJ, i|, MJ, Mj, and A'J. We start with LJ. 



From (164), it is seen that is in the form 



3u 



4a^ 
3V^ 

211/2 ^5/2 



4a, 



4a. 



■ exp (—2ajV ) w 



3w 
4a,- 



erf (■y/2aj u) — erf (■y/zo^wj 



where 

It is easy to show that the maximum of 



u = b + tTx- T^j, 



exp(-2a.u'^) ( + ) 



as a function of u is attained at m 



. Therefore, 



(184) 



(185) 



exp(-2QjM^) m"' + 



3u 



< 



(33/4 + 3^/4)6-^ 1 



3/2 



(186) 



for all u. Also, 

|erf (^2^ u) - erf (y2^t>) | < 2. (187) 

Now, using (186 1 and ( |187[ ); and applying the triangle inequality in (184), we can bound 
the magnitude of the term L| as 



where 



■ (33/4 + 3^/4)e^ 



3a/^ 



16 29/2 / 5/2- 

/ j 



The magnitude of can be similarly bounded as 



\m<L,. 



Next, from (1871 it follows that that the terms -MJ and Mj given in (166 1 and (167) can 
be uniformly bounded as 



\M]\<M, 
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where 



Lastly, one can observe from ( 168 ) that Nf has the form 



Nf 



4a 



exp (— 2Q!jW^) u + — - exp (— 2Q;jW^) v 



+ 



27/2 



^(erf(y2^ w) -erf(y2^v)^ 



where u and v are as defined in (185). Using the fact that 

I / r, 2\ I e 2 1 

|exp [—2ajU ) u\ < — 



together with (1871, the triangle inequality in the expression of iVJ gives 

\N-^\<N, 

where 



e 2 



^ \ 1 



I 4 ' 25/2 ; 3/2 ■ 



One can similarly show that 



\Ny\<N,. 



Putting all these results together, we can now uniformly upper bound £j and J-j as 



J-j — < J-j 



where £j and J-j are given by 



(188) 



Now, we are ready to state upper bounds for \ £j£k, 8 <i>kTJ£jJ^k and 4 $jT <i>kTBjk 



8T'^kT^8,Fk < ^Pk^£,Fk 
AT^<i>jTT^^kTB,k < 'if3jl3kBjk 



Finally, using (183) and the above inequalities in the bound for cr^/z^tj^-) in (178), we obtain 



^ XjXkKjKk [ £j£k + ^PjPk Bjk 



+ Xj XkHj Kk [ -8 /3fe y £jFk - 8 /3j y F^Ek + ^a^Uk g.^ 
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Let 



jk 



"-3 k 



(189) 



l3k\l£jFk - 8 I3j\j FjEk + ^ajUk B 



■jk' 



We can then rewrite the upper bound on cr^,,^^^^,^ as follows, which concludes the proof 

<^h"{tT} < =4v'^l( XjXkKjKkejk+ ^3>^kKjKk Ijkj 



B.8 Proof of Theorem [3] 



□ 



Before proving the theorem, we first give below a few preliminary results that will be useful in 
the proof. Let P(.) denote probability, and to and s be given as in the statement of Theoremjsj 
We begin with stating Chebyshev's inequality. 



Proposition 5. (Chebyshev's inequality) Let x be a random variable, — E[X] be its 
mean and be its variance. Then for s > 0, 

P{\x - ^J.x\ > sax) < \- 

Next, we state and prove the following lemma that gives a lower bound on the second 
derivative of /. 



Lemma 6. Assume that to < tQ. Let ti be a real number such that 

0<ti<to< to. 

Then, 



dPf{tTo) 



(190) 
(191) 



t=ti 



Proof. Since ti € [0,to]> due to Lemma [2] we have 

d^itTo) 



df^ 



(192) 



t=ti 



where Lq > 0, r2 < and r^ < 0. From (192), a sufficient condition on to that guarantees (191) 
is 



3 ^ ^0 



^0 + ^^2^0 + ^3^0 — 



(193) 
(194) 



Thus, if we show that (194) holds, this will provide a sufficient condition for (191). Now we 



show that the condition (190) implies (194), which will conclude the proof 



First, if ( |190[ ) holds, 
to = J 



< 



2|r2| + 22/3ry^|r:3|2/3 V 2y3rl^^\r^\y3 



2|r, 
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Then, we have 



tn < ? 



2\r, 



(195) 



Next, due to the condition to < to in (1901, 



2\r,\tl + 2'/^rl/\,\y'tl < r, 
\r2\tl+{'fy'\r,\'/'tl < I 



which imphes due to ( 195 ) that 



We thus get the condition in (194) as we claimed. 



□ 



Then, in the lemma below, we give a probabilistic upper bound for the magnitude of 
AhitoTo). 



Lemma 7. Assume that to < to. Then, with probability at least 1 



Proof. Due to Corollary [T] we have 



\Ah{toTo)\<sK^^. 



(196) 



since to < to. Remember from (87 1 that E[Ah{toTo)] — 0. Then, from Chebyshev's inequality 
we have 

P{\Ah{toTo)\>sa^hitoT„)) < \- 



Since 



< Rrr A ,, — Jr. 



P{\Ah(toTo)\>sR„^,)<^. 

Hence, with probability at least 1 — ^, 

\Ah{toTo)\<sR„^,. 



□ 



Finally, we state and prove the following probabilistic bound for the second derivative of h. 
Lemma 8. Assume that to < Let ti be a real number such that 



Let h"{tiTo) denote ^^!M!Z<i) 



t=ti 



0<ti<to< to. 
Then, with probability at least 1 — ^ 

\h"itiTo)\<sR^^„. 



(197) 
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Proof. From Corollary[2] since ti G [0,fo], the variance of h"{tiTa) is bounded as 



(198) 



Remember from (1561 that E[h" [tiTa)] — 0. Then, Chebyshev's inequahty gives the foUowing 



probabiUstic bound on \h"{tiTQ)\ 



From ( 198 ) we have 



P{\h"{t^To)\>sau.(t,To)) < ^■ 



which gives 



< 



Therefore, \h"{tiTQ)\ < s Ra-^,, with probabihty at least 1 — ^ 

Now we are ready to prove our main resuh. 
Proof of Theorem [s] 



(199) 



□ 



Proof. We have shown in Section |4.1| that the distance to between the global minima of / and 



g satisfies the condition in (31 1. We proceed with finding bounds for the terms in (31 1 



Assume that to < Iq. First, as ti e [0,to] and t2 G [0,io], due to Lemma|6j we have 
d^fitTo)\ 



Next, from Lemma [8] 











d^h{tTo) 




dt^ 


t=ti 



dt^ 



> 



with probability at least 1 — 

From ( pool and ( poTj ), we have 



d^fitTo) 



dt^ 



d^itTo) 



t=ti 



dt^ 



d'^h{tTo) 



t = t2 



dt^ 



t=ti 



d^h{tTo) 



dt^ 



t=ti 



with probability at least 1 — j7 
Lastly, from Lemma [7] 



\h{Q) - h{toTa)\ = \Ah{taTo)\ < sR„^,^ 



with probability at least 1 — \ 



(200) 



(201) 



(202) 



(203) 



Using the union bound, the probability that both (2031 and (202) hold is at least 1 

least 1-4, 



Therefore, from (31) we conclude that, with probability at 



2\h{Q) ~ h{toTo)\ 




t=ti 


, d'^HtTo) 

t=t2 





< 



2s R„ 



-0 * -^'^h" 
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where we have used (203) and (202) in the inequahty above. This yields the upper bound Rtg 
on to 



to < Rta 



Lq-S Ra^„ 



which holds with probability at least 1 ~ ^ . 

Finally, we show that the condition 1] < rjQ guarantees that 

Rto < to 

so that the assumption to < to that we made at the beginning of the proof is satisfied. This will 
conclude the proof. We have 

which yields 



Rto - 



Thus, in order to have Rtg < to, we need 



Lo ~ sc„^„ T] ~ °' 



Solving the above inequality, we get 



v<-no 



^0^0 

_2 

2 ^ (^UAh ~^ S Ca^,, 



□ 



C Proofs of alignment accuracy results for generalized 
noise model 

C.l Proof of Lemma m 

Proof. The second derivative of p{X + tT) along the direction T is given by 

df^ =2.^'= di^ ■ (204) 

k=l 

It is easy to show that 

'^''^""S^^^^ = ^-'^ + + ir)^e,r] ' - 2T^Q,t) (205) 

where 8^ = ^'jtcr^T^^'jr^. Then, the squared-norm of the second derivative oip{X + tT) is given 

by 

K 
I 

K 



^ Afc0^, (X + tT) (4 [{X - Tfe + tr)^efer] ' - 2T^efeT) 



fc=i 
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This yields 



>/ [R'^ \ / ■ 1 I 1 



3 = 1 fc=l 



where 



Gjk = 


1 






n,k = 


/ 






ICjk = 


/ 





{{X - T,)^e,r)' ((X - TkfQkTf dx 

<t>^^{X)4>^^{X) {{X - TkfQkTfr'^Q.jT dX 



I cj)^^ iX)<j,^, (X) T^e.T T^QkT dX. 



(206) 

(207) 
(208) 
(209) 



We proceed by finding uniform upper and fower bounds for Qj^, V-jk, ICjk that are independent 
of T so that the norm of the second derivative of p can be uniformly upper bounded. Since the 
terms ((X — Tj)'^QjT^ can take the value for some X vectors, we use the trivial lower bound 
for Gjk and T-Ljk- In the following, we compute uniform upper bounds Gjk, T~ijk, )Cjk and a 



uniform lower bound jCjf. for the above terms. First, let us define 

Lk = Amin(efc) = mm{a^l, a^l) 
^k = An,ax(6fc) = max(cr;;j, (j-j). 



(210) 
(211) 



Since llrll = 1, we have 



{{X~TkfQkT) < ^l\\X-Tk\ 
T^QkT < dk 



We start with Gjk- From (207), we have 

G,k < I 0^,(X)</>^,(X) dpi \\X - T,f\\X - Tfell 



'dX 



<Gjk ■■^^pl\/Gj\/Gk 



(212) 



where 



Gk ■■= 



02 (X)||X-rfe||4dX 



and the second inequality in (212) follows from the Cauchy-Schwarz inequality. Evaluating the 
integral above, one obtains 



Gk = 



16 



x.k 



^x,k^y,k 



'y,k 



(213) 



which completes the derivation of Gjk- Next, we compute Hjk- From (208), 

H,k < I cj>^^{X)c^^^{X) djdl \\X - TkfdX 



(214) 



Since U^^{X)\\ = 



H,k^^A\ 



'^\crj\Gk 



(215) 
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Finally, we derive ICj/^ and K-jk- Since 



it follows from ( 209 ) that 



(216) 



where Qjk is as defined in Using these bounds in the expression in (206), we obtain 



IR2 



d^p{X + tT) 



Let 



dX< AjAfe(16g,fc + 4/C,fc) 

U,k)eJ- 



§jk ■= IGGjk + 

h,k ■■= -snjk-snkj+iiCjk- 



We can then rewrite the above bound as 



d^p{X + tT) 
di^ 



dX < Rl„ = Y AjAfcgjfe+ ^ AjAfeh^fe 

(j,*:)eJ+ {j,k)£J- 



which concludes the proof of the lemma. 
C.2 Proof of Theorem H 

Proof. The deviation function for the general case is given by 

hgi'^T) ^ z^{X - tT)dX -2 1 {p{X) - p(X - tT)) z{X - tT)dX. 



Following the same steps as in Section [4~H one can obtain 
ul ( d^fitUo) , d^itUo) , d^hgitUo) 



dt^ 



t=ti 



dt^ 



t=t2 



dt^ 



t=tl, 



where ii, <2 G [0, uq]. 

Assume that Uq < to- Then, from Lemma [6] 



d'fitUo) 



dt^ 







t=ti 



- 2 



d'fitUo) 



dt^ 



In the following, we find upper bounds for \hg{0) — hg{uQUo)\ and ^ '^dt^^°^ 
compute Rug- We begin with \hg{0) — hg{uoUo)\- 



□ 



(217) 



\hgiO) - hg{uoUo)\ (218) 



(219) 



in order to 



t=ti 



\hgiO) - hg{U0U0)\ ^ 2 



(piX) - p{X - uoUo)) z{X - uoUo)dX 



(220) 
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Using Cauchy-Schwarz inequality in the inner product above, we can bound \hg(0) — hg(uoUo)\ 
as follows. 



\hg{0) ~ hg{uoUo)\ < 2 \\p{X) - p{X - uoUo)\\ \\z{X-uoUo) 



(221) 



Next, we find an upper bound on 
hg{tT) along the direction T is given by 



d'^hgitUg) 



From (217), the second derivative of 



(Phg{tT) 



dt2 



-2 



z{X)dX. 



p{X + tT)z{X)dX 



IR2 

(Pp{X + tT) 

IR2 (MP' 



Therefore, 



(Phg{tT) 



dt2 



R2 



(Pp{X + tT) 



z{X)dX 



< 2 



d^p{X + tT) 



dt^ 

(Pp{X + tT) 
dp 



4X)\\ 



Here we use the result of Lemma |4j which states that the norm of d^p(X + tT)/dt^ can be 
uniformly upper bounded by Rp" along all directions T and for all t values. Using this result in 
the above inequality and setting t — ti, T ~ Uq, wc obtain 



d^hgitUn) 




dt^ 


t=tl 



Equations (2191 and (222) together imply 
<Pf{tUo) , d^f{tU^) 



dt^ 



df^ 



dt^ 



> Tn — 2Rr,"V. 



(222) 



(223) 



From (218) 



2\hg{()) - hg{uoUo)\ 



df^ 



d^hgjtUo) 
dt^ 



t=t2 



Using the inequalities (221 1 and (2231 above, we get 



2R„nv' 



This gives the upper bound R^^^ stated in the theorem 



"0 < Ruo 



I SRpV 



What remains is to show that the condition v <vo on the noise magnitude guarantees that the 
assumption uq < Iq is true. It is sufficient to solve 



Ruo — 



8RpV 



which gives 



Tq — 2Rpiiv 



tr 



8i?p + 2Rp"tQ 
as stated in the theorem. This finishes the proof. 



□ 
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C.3 Proof of Corollary [3] 

Proof. We build on the proof of Theorem |4] given in Appendix C.2 Notice from (2201 that 
\hg{0) — hg{uQUo)\ can be upper bounded as 



\hgiO) - hgiU0U0)\ ^ 2 

= 2 
< 2 



IR2 



(piX) ~ p{X - uoUo)) z{X - uoUo)dX 



R2 



ip{X + uoUo)-piX))ziX)dX 



R2 



p{X + uoUo) z{X)dX 



R2 



p{X)z{X)dX 



Using the uniform bound rp^ for the inner products above, we obtain 

\hg{0) ~ hg{uoUo)\ <Arp,. 



Assume that uo < io- Then, applying the same derivation steps as in Appendix C.2 and replacing 
the term 4i?p v with 4rpz, we obtain 



Mo < Quo = 



rn - 2Rp:>v 



as stated in the corollary. Next, we look at the largest admissible value of the noise level. In 
order to have uq < toi it is sufficient to solve 



' pz 



which gives 



2Rp"tQ 



In order for to have a positive value, the correlation bound must be sufficiently small to 
satisfy 



□ 



D Proofs of alignment error variation results for Gaussian 
noise model 

D.l Proof of Lemma [5] 

Before starting the proof of Lemma [sj we state and prove the following proposition about Iq, 
which is implicitly involved in most results. 

Proposition 6. The smoothed value Iq ofto has the following dependence on p 

£o-o((i + P^)^/^). 



Proof. From the definition of to in (40), its smoothed value to is given by 



io■■=^ ^-TT. • (224) 

V2|f2|+22/3ry3|f3|2/3 
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From (149) and (148), we have 
where 



to — Amin(^o) 



K K 



j=i k=i 



(225) 



Remember from (69) and (68) that \jXkQjk = O ((1 + p^) ^) and 

A,nin(S7;) = O ((1 + pT') , A„,ax(S7,') = O ((1 + p^)-!) . 

Therefore, Rq = O ((1 + p^)"^). We thus obtain 

r, = O {{1 + pT') 



(226) 



which also proves the first statement of Lemma [5j 

Next, we derive the dependence of Ifj]. Since Amax(fc7_) — O {^{1 + p^)^^), the maximum 
eigenvalue of the smoothed version of the term i?^*" in ( [l50| is X^e.AM'') = O {{1 + p^)^^)- 
Hence, equations (151)- (154) give 



Therefore, £2 = ((1 + p^)"^). Since 
Lastly, we have 



-4 


= o((i+pr^) 


-2 

ajk 


= o{ii + pY') 


-2 
ajk 


- o{ii + pT') 


-2 _ 

bjk^jk 


= 0{{1 + pT')- 


. Since 


= min(r2,0) < 0, we have 


If2l 


<0((l + p2)-3). 



(227) 



K K 



,5/2 



which gives Ifgl = 0(| Aj Afc|(3jfc)0(Amax(S^fc^))- We thus have 

if3Ho((i+pr^/2). 



(228) 



The results ( |226| ), ( |227| ), ( |228[ ) together with ( |224p give 

lo = o((l + P^)^/^ 

Now we can prove the lemma. 
Proof of Lemma [s] 



□ 



Proof. From (37 1, the smoothed variance bound is ^^o-^^ — ^o-^^'? ? which yields 



""Ah ^fAft 



(229) 



Remember from (54) that f) = 0(?7P )• In the following, we look at the variation of 



°^Ah 



= 4L| X/ Aj^ AfeKfc Cjfe + ^ XjkjXkKk^jk 

{j:k)eJ+ {3,k)<£j- 
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with p. Observe that 



Xjkj — 



A, 



with respect to p. Next, we look at the variations of 



7rA,|a,||^| 



= 0(1) 



(230) 



(231) 



which require the computation of the dependences of Bjk, C .,, i'jk, B , Cjk, 2^,^ on p. 



j''' kjk' •^J'^' =jfc' ^■J''^' 

Since ctj and /3j are respectively the smaller and greater eigenvalues of $j = '^ji'^j 



)^ , we have 



From (1291, 



= 0((l + pr^). 



■ exp 



/3,=0((l + p2)-i) 



a^afe - Tj\ 



(232) 



((ij + (ife) 



462 (q,^. 

The term with the minus sign in the exponential in the above equation is 0{aj) = O (^{1 + p^)~^) , 
which approaches as p increases. Therefore, exp ^— 
Hence, 

Ijk^O{aj') = 0{l + p^). 



= 0(1) with respect to p. 

(233) 



Then, from (1351, B is given by 



B .= exp -biu - b",, — X — 



where 



&"fc = + /3fe) max <^ [b + to - 



-h - to - 



PjTx.j + PkTx.k 



and bjj, has the same form as b^^,. Since £q = O (l + p^) from Proposition joj and /3j = 

0((l + p2)-i), we have bj^ = 0(1), b^;^. = 0(1) with respect to p. The term 

in the expression of B^^. has the same order as therefore, it approaches as p increases. We 
thus get 



i,. = 0(1) 



(234) 



with respect to p. 



We continue with C,fc and C . From (136), 



Cjk 



Therefore, 



452 (dj + OLk) 

d,fe = o(i + p2). 



(235) 



(236) 
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Due to (141), C is given by 
— 



where 



ra , o \ J I u , i 1^3 l3jTx,j + l3kTx,k 



•>jk 



k + h 



(3 J + Pk 13 J +Pk J 

|(-io + Tx,k — Tx.jY, {to + Tx,k — Tx.j)^^ 



-b-l 



0^ 



f3j f3]TxJ + PkTx.k 



/3, + Pk 



and c^^ and t)|^ are of the same form. In a manner similar to the derivation made for b^^,, one 



can show that c^^., cj^, 0^^,, Oj^^. are 0(1) with respect to p. Therefore, 

i,k = o{i). 



(237) 



Lastly, we examine the terms P^-j. and "Djk. From (119) 



-J 



—a; — y 



where 



2? 



1 



/3j + Pk 



exp 



(-(/3,+/3fe)(G;,(0,0)-(^;,(0,0))2)) 



erf + (5 + ^^-^(0, 0)) - erf + /3fc (-6 + ^^,,(0, 0)) 



and J. are in the same form as "D^jk ■ From the expressions of H/^jk ^-nd GJ^, in ( 105 ) , one can see 
that Hiji^{0, 0) and (7^^,(0, 0) are 0(1). As the term /Sj + /3k in the exponential approaches as p 



mcreases, exp (-(/?, + pk) (g%{0, 0) - (Hj\(0, 0))^ 
of the erf (.) function, 



= 0(1). Next, due to the boundedness 



erf Vft- + {b + H,,iO, 0)) - erf Jft' + l^k {-b + H^,{0, 0)) 



<0(1). 



Therefore the dependence of P a- on p is upper bounded by the dependence of . on p. 

Hence, < O ((1 + p^f'^). As P^^j, < O ((1 + p^f^^) as well, 



^,fc<0(l + p2) 



(238) 



One can similarly show that 

i>jk < O (1 + p") . (239) 
Finally, putting together the results p33| , ( p34| , p36| ), p37| , p38| , ( [2391 ) ™ ( p3l| ) yields 

£jfe = O (l + p^) 
4fc = 0(l + p2). 
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Hence, 



Combining this with fj = 0{r]p ) in (229) we conclude that 



(240) 

□ 



D.2 Proof of Theorem [5] 

Proof. We would like to characterize the dependence of 



4 = J^l3:i^ 

on p and ry. We have already examined the variations of Rg_^ and Tq in Appendix 
remains to derive the dependence of Ra,^,, on p and rj. From (39), 



(241) 



D.l 



It 



(242) 



We thus need to determine the variation of Co-,^„ with p. From the exact expressions of ejfc and 
l^-fe in ([189]), 



^ \j\kkjkk (w^jijik +4:PjPk Bjk\ 
U,k)eJ+ \ / 



+ ^^^^kkjfik -8 (3k\l£jJ'k -8 I3j\l Fj£k + Aa^ak g 

U,k)eJ- \ 



(243) 



:jk 



We have already examined the variation of Xjkj, ci,-, /?,-, S and Bjk in Appendix 

— jk 



Therefore, we need to evaluate £j and J-j. From (188), these terms are given by 



D.l 



where 



(3-V4 + 35/4)e-# 3V^\ 1 



16 

,/7r \ 1 



29/2 ; 5/2 



4 25/2 ; 3/2 ■ 

/ j 
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As CL.J = O ((1 + we have 

- 0((l + p2)i/2 

TV, - 0((l+p2)3/2) 

Remembering that /3j = O ((1 + p^)^^) , we obtain 

?,=0((l + p2)-i) 
J, -0(l + p2) 



which yields 



EjTj = 0(1). 



of Cj^a in ( 243 1 , we obtain 



Combining the results ( 230 ) , ( 232 1 , ( 233 ) and ( 234 1 derived in Appendix D. 1 with the expression 



h" 
Crr. 



0((l+p^)-l/2). 



Putting this result together with t) = 0{rip ^) in (242) yields 



(244) 



We can now put together the results (240), (226), (|244[) in the smoothed bound (241) as 



follows, which gives the variation of the alignment error bound with r/ and p 

Rto = o 



r]p- 



(1 + p2) ^ — f] p 



= O 



VP'" 

I-Tip 



□ 



E Proofs of alignment error variation results for general- 
ized noise model 

E.l Proof of Theorem |6] 

Before starting the proof of the theorem, we state and prove the following proposition about 
the norm of the reference pattern and the norm of its second derivative. 

Proposition 7. The variation of the norm Rp of the smoothed pattern p with respect to p is 

Rp = 0{il + pT'/-'). 

The smoothed uniform bound on the norm of the second derivative of p has the dependence 

i?^„=0((l + p2)-3/2) 

on p. 
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Proof. We begin with Rp. First, from (44 1, the squared-norm of the unfiltered pattern p is 
given by 

K K ^ 



3 = 1 k = l 



Therefore, 



K K ^ 



3 = 1 k = l 



Remember from (69) that XjXkQjk = 0((1 + p^) This gives 

Rl = 0{{l + pT') 

and 

i?p = 0((l + p2)-i/2). 



(245) 



We now determine the dependence of Rp" on p. From (45), the smoothed value of Rp„ is 
given by 

^P" = H AjAfcgjfe+ ^-^fe^-fc- (246) 



Since Xj = \(Tj\/\aj\Xj , Xj = 0((1 + p^)"^). Therefore, 

A.Afe = 0((l + p2)-2)_ 

Then in order to look at the dependences of 

gjk = IQQjk + 4/Cjfc 

hjk = -8n,k - S-Hkj + iLjk 



(247) 



(248) 



we examine the terms Gjk, l^jk, Hjk, ICjk and ^j^,. We begin with Gjk- From (212) 

G3k = ^^,^l^Jf,\/¥k■ 



As i9j = max((T^_^., (Tj^_^.) and = min(o-^^^., ay 



-2 --2\ 



Hence, 



From (213) 



£,=0((l + p2)-i). 

m^oiii + pT')- 



which yields Gj = 0((1 + p^)"^)- Putting these together, we obtain 

^^.,=0((l+p2)-l). 



(249) 



Next, from (215) 



n,k = SA\r-^- 
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One can similarly show that 



Lastly, from (2161 



?i,, = o((i+p2)-i). 



Kijk = 2^ji^kQjk- 



(250) 



Equations ([69j) and ( |247| give Qjk = 0{l + p^). Since -dj.tj = 0((1 + we get 

^,fc = 0((l + p2)-i) 
4, = 0((l + p2)-i). 



(251) 



Combining the results p49| ), ( pSOf , ( p5l] ) in ( [248| ) yields 

g, , = 0((l + p2)-i) 

h, , = 0((l + p2)-i). 



Using this with ( 247 ) in ( 246 ) , we finally obtain 

32 



hence 



r;„^o{{i + p')-^), 

i?,. = 0((l + p2)-3/2). 



□ 



Proof of Theorem [6] 

Proof. We start with the bound Ruq for the generalized noise model z{X) with no assumptions 
on the correlation. From ( 47 1 , the smoothed version of Ruq is given by 



(252) 



First, it is easy to observe that Rp and i?p" are parameters depending on p; and therefore, do 
not depend on the norm v of the unfiltered noise pattern. It has been shown in Appendix [p.l| that 
f^ = ((1 + p2)-2). Also, from Proposition^ Rp = 0((l+p2)-i/2) and Rp„ ^ 0{{l+p^ 



-3/2^ 



Next, we examine the term v. Due to the linearity of the convolution operator, the norm 
V of the smoothed noise pattern z is linearly proportional to the norm v of the original noise 
patten z. Therefore, the variation of v with v is 

v^O{v). 

Since the linear span of the Gaussian dictionary T> is dense in L^(IR^), one can assume a repre- 
sentation of the noise pattern z G L^(IR-^) in terms of Gaussian atoms. Therefore, similarly to 
v has the dependence 



on p. We thus obtain 



Z> = 0((l+p2)-l/2) 

Rpv = 0{v{l + p^)-^) 
Rpnv^O{v{l + p')-^). 
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Combining these results in (252) yields the joint variation of R^g with and p 



' ^ V (1 + p^r' + p^)-^ K ^ V "^T^ 



We continue with the bound for the limited-correlation case. From (50 1, Quo is given 

by ^ 

Quo = J „ . (253) 

Since the magnitude of the inner product between z and a translated version of p scales linearly 
with the norm v oi z, 

fpz = 0{y) (254) 

with respect to v. Next, we determine the dependence oifpz on p. Let us assume a representation 
of z € L^(IR^) with M atoms in 2?, which approximates z sufficiently well. 

M 



Here we denote the parameters of the m-th atom by which consists of the translation A^, 
the rotation T,„ and the scale change Vim- We denote the atom coefficients of z by <;„i- The 
inner product between z and a translated version of p is then given by 



K M 



[ p{X + tT)z{X)dX^ [ VAfe</)^,(X + <r) y^'^ra^x.Ax) dX 

K M 

= E E ^" / -^7. (X + tT) <^^„ (X) dX 



where 



- 1^ 2^^^ — 9 — 1 — 

k = lm=l ^\J\^km\ 



Notice that dkm{iT) and are in the same form as the terms Cjk and Sj^ defined in Section 



3.2 Defining 

\/\^km\ 

we can write 

„ KM 

/ p(X + tT)z(X)dX- V V -Afe?„P, 



where Pkm is in the same form as the term Qjk in Section 3.2 This inner product for smoothed 
patterns becomes 

K M 



[ p{X + tT)z{X)dX = E E ^^fc ^"-^fc 



fe = l m=l 
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It has been shown in (69) that XjXkQjk = 0((1 + p^) ^). Since the terms are in the same form, 



we also have ^mPkm = 0((1 



R2 



) ^). Hence, 
p{X + tT)z{X)dX = 0((1 + p^)-^). 



A good correlation bound fpz must typically be comparable to the supremum of the magnitude 
of the above inner product over tT. Therefore, one can assume that 

fp,=0((l+p2)-l). 



The combination of this result with (254) yields the joint variation 



Now, comparing (252 1 and (253) we observe that _R„(, and Qu„ have the same denominator, and 
the terms RpV^ Vpz in their numerators have the same variation with v and p. Therefore, we 
conclude that Quo has the same variation with 



o 



1 - v 



□ 
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